Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

The impact of biased sampling of event logs on the performance of process discovery

: Fani Sani, M.; Zelst, S.J. van; Aalst, W.M.P. van der

Volltext ()

Computing 103 (2021), Nr.6, S.1085-1104
ISSN: 0010-485X (Print)
ISSN: 1436-5057 (Online)
Zeitschriftenaufsatz, Elektronische Publikation
Fraunhofer FIT ()

With Process discovery algorithms, we discover process models based on event data, captured during the execution of business processes. The process discovery algorithms tend to use the whole event data. When dealing with large event data, it is no longer feasible to use standard hardware in a limited time. A straightforward approach to overcome this problem is to down-size the data utilizing a random sampling method. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper systematically evaluates various biased sampling methods and evaluates their performance on different datasets using four different discovery techniques. Our experiments show that it is possible to considerably speed up discovery techniques using biased sampling without losing the resulting process model quality. Furthermore, due to the implicit filtering (removing outliers) obtained by applying the sampling technique, the model quality may even be improved.