Now showing 1 - 10 of 28
  • Publication
    MissFormer: (In-)Attention-Based Handling of Missing Observations for Trajectory Filtering and Prediction
    In applications such as object tracking, time-series data inevitably carry missing observations. Following the success of deep learning-based models for various sequence learning tasks, these models increasingly replace classic approaches in object tracking applications for inferring the objects' motion states. While traditional tracking approaches can deal with missing observations, most of their deep counterparts are, by default, not suited for this. Towards this end, this paper introduces a transformer-based approach for handling missing observations in variable input length trajectory data. The model is formed indirectly by successively increasing the complexity of the demanded inference tasks. Starting from reproducing noise-free trajectories, the model then learns to infer trajectories from noisy inputs. By providing missing tokens, binary-encoded missing events, the model learns to in-attend to missing data and infers a complete trajectory conditioned on the remaining inputs. In the case of a sequence of successive missing events, the model then acts as a pure prediction model. The abilities of the approach are demonstrated on synthetic data and real-world data reflecting prototypical object tracking scenarios.
  • Publication
    Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction
    Methods to quantify the complexity of trajectory datasets are still a missing piece in benchmarking human trajectory prediction models. In order to gain a better understanding of the complexity of trajectory prediction tasks and following the intuition, that more complex datasets contain more information, an approach for quantifying the amount of information contained in a dataset from a prototype-based dataset representation is proposed. The dataset representation is obtained by first employing a non-trivial spatial sequence alignment, which enables a subsequent learning vector quantization (LVQ) stage. A large-scale complexity analysis is conducted on several human trajectory prediction benchmarking datasets, followed by a brief discussion on indications for human trajectory prediction and benchmarking.
  • Publication
    A complementary trajectory prediction benchmark
    Existing benchmarks targeting the overall performance of trajectory prediction models lack the possibility of gaining insight into a model's behavior under specific conditions. Towards this end, a new benchmark aiming to take on a complementary role compared to existing benchmarks is proposed. It consists of synthetically generated and modified real-world trajectories from established datasets with scenario-dependent test and training splits. The benchmark provides a hierarchy of three inference tasks, representation learning, de-noising, and prediction, comprised of several test cases targeting specific aspects of a given machine learning model. This allows a differentiated evaluation of the model's behavior and generalization capabilities. As a result, a sanity check for single trajectory models is provided aiming to prevent failure cases and highlighting requirements for improving modeling capabilities.
  • Publication
    A Short Note on Analyzing Sequence Complexity in Trajectory Prediction Benchmarks
    The analysis and quantification of sequence complexity is an open problem frequently encountered when defining trajectory prediction benchmarks. In order to enable a more informative assembly of a data basis, an approach for determining a dataset representation in terms of a small set of distinguishable prototypical sub-sequences is proposed. The approach employs a sequence alignment followed by a learning vector quantization (LVQ) stage. A first proof of concept on synthetically generated and real-world datasets shows the viability of the approach.
  • Publication
    RED: A simple but effective Baseline Predictor for the TrajNet Benchmark
    In recent years, there is a shift from modeling the tracking problem based on Bayesian formulation towards using deep neural networks. Towards this end, in this paper the effectiveness of various deep neural networks for predicting future pedestrian paths are evaluated. The analyzed deep networks solely rely, like in the traditional approaches, on observed tracklets without human-human interaction information. The evaluation is done on the publicly available TrajNet benchmark dataset [39], which builds up a repository of considerable and popular datasets for trajectory prediction. We show how a Recurrent-Encoder with a Dense layer stacked on top, referred to as RED-predictor, is able to achieve top-rank at the TrajNet 2018 challenge compared to elaborated models. Further, we investigate failure cases and give explanations for observed phenomena, and give some recommendations for overcoming demonstrated shortcomings.
  • Publication
    Joint detection and online multi-object tracking
    ( 2018)
    Kieritz, Hilke
    ;
    ;
    Most multiple object tracking methods rely on object detection methods in order to initialize new tracks and to update existing tracks. Although strongly interconnected, tracking and detection are usually addressed as separate building blocks. However both parts can benefit from each other, e.g. the affinity model from the tracking method can reuse appearance features already calculated by the detector, and the detector can use object information from past in order to avoid missed detection. Towards this end, we propose a multiple object tracking method that jointly performs detection and tracking in a single neural network architecture. By training both parts together, we can use optimized parameters instead of heuristic decisions over the track lifetime. We adapt the Single Shot MultiBox Detector (SSD)[14] to serve single frame detection to a recurrent neural network (RNN), which combines detections into tracks. We show initial prove of concept on the DETRAC[26] benchmark with competitive results, illustrating the feasibility of learnable track management. We conclude with a discussion of open problems on the MOT16[15] benchmark.
  • Publication
    On the reliability of LSTM-MDL models for pedestrian trajectory prediction
    Recurrent neural networks, like the LSTM model, have been applied to various sequence learning tasks with great success. Following this, it seems natural to use LSTM models for predicting future locations in object tracking tasks. In this paper, we evaluate an adaption of a LSTM-MDL model and investigate its reliability in the context of pedestrian trajectory prediction. Thereby, we demonstrate the fallacy of solely relying on prediction metrics for evaluating the model and how the models capabilities can lead to suboptimal prediction results. Towards this end, two experiments are provided. Firstly, the models prediction abilities are evaluated on publicly available surveillance datasets. Secondly, the capabilities of capturing motion patterns are examined. Further, we investigate failure cases and give explanations for observed phenomena, granting insight into the models reliability in tracking applications. Lastly, we give some hints how demonstrated shortcomings may be circumvented.
  • Publication
    Video-based log generation for security systems in indoor surveillance scenarios
    In the EU FP7-SEC-2012-1-313034 project SAWSOC (Situation AWare Security Operations Center) the objective is to achieve "the convergence of physical and logical security technologies, particularly improving correlation techniques across existing technology silos (video surveillance, access control, network monitoring, etc.)". In this paper two use cases developed in SAWSOC are presented in the perspective of video-based log generation. The first is a critical infrastructure where we log visual observable occurrences in a critical server room. The second is a soccer stadium environment where we log the patrol path of guards and ensure the correct handling of each checkpoint of their patrol path. Our approach consists of several generic computer vision modules and spatio-temporal data fusion using scene dependent knowledge. Each component of its own does not allow to make any statements about the current situation in the observed area. Instead, the sum of all components has to be considered.
  • Publication
    The thermal infrared visual object tracking VOT-TIR2016 challenge results
    ( 2016)
    Felsberg, Michael
    ;
    Kristian, Matej
    ;
    Matas, Jiri
    ;
    ;
    Krah, Sebastian
    ;
    ; ;
    et al.
    The Thermal Infrared Visual Object Tracking challenge 2016, VOT-TIR2016, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2016 is the second benchmark on short-term tracking in TIR sequences. Results of 24 trackers are presented. For each participating tracker, a short description is provided in the appendix. The VOT-TIR2016 challenge is similar to the 2015 challenge, the main difference is the introduction of new, more difficult sequences into the dataset. Furthermore, VOT-TIR2016 evaluation adopted the improvements regarding overlap calculation in VOT2016. Compared to VOT-TIR2015, a significant general improvement of results has been observed, which partly compensate for the more difficult sequences. The dataset, the evaluation kit, as well as the results are publicly available at the challenge website.
  • Publication
    Action recognition in the longwave infrared and the visible spectrum using Hough forests
    Action recognition in surveillance systems has to work 24/7 under all kinds of weather and lighting conditions. Towards this end, most action recognition systems only work in the visible spectrum which limits their general usage to daytime applications. In this work Hough forests are applied to the longwave infrared spectrum which can capture humans both in the dark and in daylight. Further, Integral Channel Features which have shown promising results in the spatial domain are applied to the spatio-temporal domain and are incorporated into the Hough forest approach. This approach is evaluated on a new outdoor dataset containing different violent and non-violent actions recorded in the visible and infrared spectrum. It is further shown that for the visible spectrum the proposed approach achieves state-of-the-art results on the KTH and i3DPost dataset.