Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Distributed online learning for large-scale pattern prediction over real-time event streams

: Qadah, Ehab
: Mock, Michael; Wrobel, Stefan

Volltext urn:nbn:de:0011-n-4846551 (5.9 MByte PDF)
MD5 Fingerprint: a370c01b0248079f397116d7d4274e7f
Erstellt am: 1.3.2018

Bonn, 2018, VII, 60 S.
Bonn, Univ., Master Thesis, 2018
European Commission EC
H2020; 687591; datAcron
Big Data Analytics for Time Critical Mobility
Master Thesis, Elektronische Publikation
Fraunhofer IAIS ()

In many application domains, such as maritime surveillance, financial services, network monitoring, and sensor networks, massive amounts of streaming data are being generated in real-time. The records of these streams can be encoded as events. However, in order to benefit from the live streaming events, there is a need for systems that enable the large-scale, real-time processing and analytics tasks. For instance, predicting event patterns that represent situations of interest from massive streaming events is an important utility for the real-time decision making. Such a utility allows to react proactively to the new situations and to improve the effectiveness of the operational tasks. In this thesis, we present the design, implementation, and evaluation of a scalable prediction system for user-defined patterns over multiple massive streams of events. The proposed system is based on a novel approach of combining probabilistic event pattern prediction models on multiple predictor nodes with a distributed online learning protocol to continuously learn the parameters of a global prediction model in a communication-efficient way, and to share it among the predictors. For scalability, the system is implemented on top of Apache Flink, a popular engine for distributed and large-scale stream processing. The key idea of the system is to enable the collaborative learning and information exchange between the distributed predictors by sharing a global prediction model, where the learning convergence is accelerated with less data for each predictor. We describe the distributed architecture and implementation of the proposed system along with the theoretical analysis that focuses on giving a probabilistic learning guarantee for the proposed synchronized global model. Our empirical evaluations show the effectiveness of the proposed approach using synthetic event stream and real-world event streams in the context of the maritime domain.