Biological sequence analysis meets mobility mining
Traffic and mobility analysis are fascinating and fast growing areas of data mining and geographical information systems that impact the lives of billions of people every day. Another wellknown scientific field that impacts lives of billions is biological sequence analysis. It has experienced an incredible evolution in the recent past, especially since the Human Genome project. So far, however, both fields never met. This is surprising since both face a similar challenge, namely the identification of relevant patterns in massive sequential information. Indeed, whereas biological sequence analysis has mainly focused on sequences of (few) symbols, traffic and mobility mining often focus on sequences of continuous values. Thus, one may argue that building bridges between them is insurmountable. In this paper, we show that this is actually not the case. Using well-known discretization techniques such as stay-point detection and map matching, we can turn most - if not all - traffic sequences into a "biological" sequence. Then, we apply the rich toolbox for biological sequence analysis to traffic data. For instance, by just looking at complex traffic data through the biological glasses of sequence logos we get a novel, easyto- grasp visualization of the data, called "Traffic Logos". Sequence alignment can be used for activity analysis, and profile hidden Markov models are well suited for capturing event persistence during event detection. Actually, our empirical evaluation on three real-world data sets demonstrates that exploiting the link between traffic and DNA can result in state-of-the-art performance.