Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Generic error identification in data sets

 
: El Bekri, Nadia; Peinsipp, Byma

:
Volltext urn:nbn:de:0011-n-4263663 (303 KByte PDF)
MD5 Fingerprint: 3526cab35c9ff2fd92191c5bb02614c9
Erstellt am: 21.12.2016


Harris, F.C. ; International Society for Computers and Their Applications -ISCA-:
25th International Conference on Software Engineering and Data Engineering, SEDE 2016 : Denver, Colorado, USA, 26-28 September 2016; Co-located with the 29th International Conference on Computer Applications in Industry and Engineering (CAINE 2016)
Red Hook, NY: Curran, 2016
ISBN: 978-1-5108-2897-1
ISBN: 978-1-943436-05-7
S.177-182
International Conference on Software Engineering and Data Engineering (SEDE) <25, 2016, Denver/Colo.>
International Conference on Computer Applications in Industry and Engineering (CAINE) <29, 2016, Denver/Colo.>
Englisch
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IOSB ()
data mining; clustering; association analysis; Association Rules

Abstract
The manual acquisition of data is in many areas, as for example the United States Environmental Protection Agency (EPA) [1] does, quite common. This type of data acquisition can lead to many errors within the data set. Such errors can affect extracted rules and patterns from Data Mining algorithms. A wrong data entry for example could be a too high fuel consumption for a vehicle caused by a missing comma. If a customer considers buying this vehicle and looks up the fuel consumption via the EPA database an incorrect data entry could influence his purchase decision. A manual inspection of the data set is very time consuming and not practical for large data sets. The inspection of the data set therefore needs automatic procedures to remain accurate. This paper illustrates the approach to identify errors with the methodology of association rules. By combining various algorithms of the field of clustering and association analysis, the association rules are generated. These association rules can help prevent erroneous data entries in advance.

: http://publica.fraunhofer.de/dokumente/N-426366.html