Options
2016
Conference Paper
Title
Generic error identification in data sets
Abstract
The manual acquisition of data is in many areas, as for example the United States Environmental Protection Agency (EPA) [1] does, quite common. This type of data acquisition can lead to many errors within the data set. Such errors can affect extracted rules and patterns from Data Mining algorithms. A wrong data entry for example could be a too high fuel consumption for a vehicle caused by a missing comma. If a customer considers buying this vehicle and looks up the fuel consumption via the EPA database an incorrect data entry could influence his purchase decision. A manual inspection of the data set is very time consuming and not practical for large data sets. The inspection of the data set therefore needs automatic procedures to remain accurate. This paper illustrates the approach to identify errors with the methodology of association rules. By combining various algorithms of the field of clustering and association analysis, the association rules are generated. These association rules can help prevent erroneous data entries in advance.
File(s)
Rights
Use according to copyright law
Language
English