Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Biological sequence analysis meets mobility mining

: Jawad, A.; Kersting, K.; Andrienko, N.

Spiliopoulou, M.:
LWA 2011 : Otto-von-Guericke-Universität Magdeburg, 28. - 30. September 2011. Technical report. Report of the symposium "Lernen, Wissen, Adaptivität 2011" of the GI special interest groups KDML, IR and WM
Magdeburg: Otto-von-Guericke-Universität Magdeburg, 2011
Symposium "Lernen, Wissen, Adaptivität" (LWA) <2011, Magdeburg>
Fraunhofer IAIS ()

Traffic and mobility analysis are fascinating and fast growing areas of data mining and geographical information systems that impact the lives of billions of people every day. Another wellknown scientific field that impacts lives of billions is biological sequence analysis. It has experienced an incredible evolution in the recent past, especially since the Human Genome project. So far, however, both fields never met. This is surprising since both face a similar challenge, namely the identification of relevant patterns in massive sequential information. Indeed, whereas biological sequence analysis has mainly focused on sequences of (few) symbols, traffic and mobility mining often focus on sequences of continuous values. Thus, one may argue that building bridges between them is insurmountable. In this paper, we show that this is actually not the case. Using well-known discretization techniques such as stay-point detection and map matching, we can turn most - if not all - traffic sequences into a "biological" sequence. Then, we apply the rich toolbox for biological sequence analysis to traffic data. For instance, by just looking at complex traffic data through the biological glasses of sequence logos we get a novel, easyto- grasp visualization of the data, called "Traffic Logos". Sequence alignment can be used for activity analysis, and profile hidden Markov models are well suited for capturing event persistence during event detection. Actually, our empirical evaluation on three real-world data sets demonstrates that exploiting the link between traffic and DNA can result in state-of-the-art performance.