• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. A provenance meta learning framework for missing data handling methods selection
 
  • Details
  • Full
Options
2020
Conference Paper
Title

A provenance meta learning framework for missing data handling methods selection

Abstract
Missing data is a big problem in many real-world data sets and applications, which can lead to wrong or misleading results of analyses and lower quality and confidence in the results. A large number of missing data handling methods have been proposed in the research community but there exists no universally single best method which can handle all the missing data problems. To select the right method for a specific missing data handling problem, it usually depends on multiple inter-twined factors. To alleviate this methods selection problem, in this paper, we propose a Provenance Meta Learning Framework to simplify this process. We conducted an extensive literature review over 118 missing data handling method survey papers from 2000 to 2019. With this review, we analyse 9 influential factors and 12 selection criteria for missing data handling methods and further perform a detailed analysis of 6 popular missing data handling methods (4 machine learning methods, i.e., KNN Imputation (KNNI), Weighted KNN Imputation (WKNNI), K Means Imputation (KMI), and Fuzzy KMI (FKMI), and 2 ad-hoc methods, i.e., Median/Mode Imputation (MMI) and Group/Class MMI (CMMI)). We focus on missing data handling methods selection for 3 different classification techniques, i.e., C4.5, KNN, and RIPPER. In our evaluations, we adopt 25 real world data sets from KEEL and UCI data sets repositories. Our Provenance Meta Learning Framework suggests that using KNNI to handle missing values when missing data mechanism is Missing Complete At Random (MCAR), missing data pattern is uni-attribute missing data pattern, or monotone missing data pattern, missing data rate is within [1%,5%], number of class labels is 2, sample size is no more than 10'000, since it can keep classification performance better and have higher imputation accuracy and imputation exhaustiveness than all the other 5 missing data handling methods when subsequent classification methods are KNN or RIPPER.
Author(s)
Liu, Qian
Technische Universität Berlin
Hauswirth, Manfred  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Mainwork
11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference, UEMCON 2020  
Project(s)
ProvDS
Funder
Deutsche Forschungsgemeinschaft DFG  
Conference
Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) 2020  
DOI
10.1109/UEMCON51285.2020.9298089
Language
English
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Keyword(s)
  • provenance meta learning

  • meta rule induction

  • automated missing data handling methods selection

  • missing data handling / treatment

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024