Now showing 1 - 10 of 64
No Thumbnail Available
Publication

MissFormer: (In-)Attention-Based Handling of Missing Observations for Trajectory Filtering and Prediction

2021 , Becker, Stefan , Hug, Ronny , Hübner, Wolfgang , Arens, Michael , Morris, Brendan Tran

In applications such as object tracking, time-series data inevitably carry missing observations. Following the success of deep learning-based models for various sequence learning tasks, these models increasingly replace classic approaches in object tracking applications for inferring the objects' motion states. While traditional tracking approaches can deal with missing observations, most of their deep counterparts are, by default, not suited for this. Towards this end, this paper introduces a transformer-based approach for handling missing observations in variable input length trajectory data. The model is formed indirectly by successively increasing the complexity of the demanded inference tasks. Starting from reproducing noise-free trajectories, the model then learns to infer trajectories from noisy inputs. By providing missing tokens, binary-encoded missing events, the model learns to in-attend to missing data and infers a complete trajectory conditioned on the remaining inputs. In the case of a sequence of successive missing events, the model then acts as a pure prediction model. The abilities of the approach are demonstrated on synthetic data and real-world data reflecting prototypical object tracking scenarios.

No Thumbnail Available
Publication

A complementary trajectory prediction benchmark

2020 , Hug, Ronny , Becker, Stefan , Hübner, Wolfgang , Arens, Michael

Existing benchmarks targeting the overall performance of trajectory prediction models lack the possibility of gaining insight into a model's behavior under specific conditions. Towards this end, a new benchmark aiming to take on a complementary role compared to existing benchmarks is proposed. It consists of synthetically generated and modified real-world trajectories from established datasets with scenario-dependent test and training splits. The benchmark provides a hierarchy of three inference tasks, representation learning, de-noising, and prediction, comprised of several test cases targeting specific aspects of a given machine learning model. This allows a differentiated evaluation of the model's behavior and generalization capabilities. As a result, a sanity check for single trajectory models is provided aiming to prevent failure cases and highlighting requirements for improving modeling capabilities.

No Thumbnail Available
Publication

Computer Vision for Medical Infant Motion Analysis: State of the Art and RGB-D Data Set

2019 , Hesse, Nikolas , Bodensteiner, Christoph , Arens, Michael , Hofmann, Ulrich G. , Weinberger, Raphael , Schroeder, Sebastian A.

Assessment of spontaneous movements of infants lets trained experts predict neurodevelopmental disorders like cerebral palsy at a very young age, allowing early intervention for affected infants. An automated motion analysis system requires to accurately capture body movements, ideally without markers or attached sensors to not affect the movements of infants. A vast majority of recent approaches for human pose estimation focuses on adults, leading to a degradation of accuracy if applied to infants. Hence, multiple systems for infant pose estimation have been developed. Due to the lack of publicly available benchmark data sets, a standardized evaluation, let alone a comparison of different approaches is impossible. We fill this gap by releasing the Moving INfants In RGB-D (MINI-RGBD) (Data set available for research purposes at http://s.fhg.de/mini-rgbd) data set, created using the recently introduced Skinned Multi-Infant Linear body model (SMIL). We map real infant movements to the SMIL model with realistic shapes and textures, and generate RGB and depth images with precise ground truth 2D and 3D joint positions. We evaluate our data set with state-of-the-art methods for 2D pose estimation in RGB images and for 3D pose estimation in depth images. Evaluation of 2D pose estimation results in a PCKh rate of 88.1% and 94.5% (depending on correctness threshold), and PCKh rates of 64.2%, respectively 90.4% for 3D pose estimation. We hope to foster research in medical infant motion analysis to get closer to an automated system for early detection of neurodevelopmental disorders.

No Thumbnail Available
Publication

Monocular 3D Vehicle Trajectory Reconstruction Using Terrain Shape Constraints

2018 , Bullinger, Sebastian , Bodensteiner, Christoph , Arens, Michael

This work proposes a novel approach to reconstruct three-dimensional vehicle trajectories in monocular video sequences. We leverage state-of-the-art instance-aware semantic segmentation and optical flow methods to compute object video tracks on pixel level. This approach uses Structure from Motion to determine camera poses relative to vehicle instances and environment structures. We parameterize vehicle trajectories with a single variable by combining object and background reconstructions. The naive combination of vehicle and environment reconstruction results in inconsistent motion trajectories due to the scale ambiguity of SfM. We determine consistent object trajectories by projecting dense vehicle reconstructions on the terrain surface. Our scale ratio estimation approach shows no degenerated camera-vehicle-motions. We demonstrate the usefulness of our approach using publicly available video data of driving scenarios. We extend this evaluation showing trajectory reconstruction results using drone footage. We use synthetic data of vehicles in urban environments to evaluate the proposed algorithm. We achieve an average reconstruction-to-ground-truth distance of 0.17 meter.

No Thumbnail Available
Publication

Context Sensitivity of Spatio-Temporal Activity Detection using Hierarchical Deep Neural Networks in Extended Videos

2020 , Hertlein, Felix , Münch, David , Arens, Michael

The amount of available surveillance video data is increasing rapidly and therefore makes manual inspection impractical. The goal of activity detection is to automatically localize activities spatially and temporally in a large collection of video data. In this work we will answer the question to what extent context plays a role in spatio-temporal activity detection in extended videos. Towards this end we propose a hierarchical pipeline for activity detection which spatially localizes objects first and subsequently generates spatial-temporal action tubes. Additionally, a suitable metric for performance evaluation is enhanced. Thus, we evaluate our system using the TRECVID 2019 ActEV challenge dataset. We investigated the sensitivity by detecting activities multiple times with various spatial margins around the performing actor. The results showed that our pipeline and metric is suited for detecting activities in extended videos.

No Thumbnail Available
Publication

3D Object Trajectory Reconstruction using Instance-Aware Multibody Structure from Motion and Stereo Sequence Constraints

2019 , Bullinger, Sebastian , Bodensteiner, Christoph , Arens, Michael , Stiefelhagen, Rainer

Three-dimensional environment perception is a key element of autonomous driving and driver assistance systems. A common image based approach to determine threedimensional scene information is stereo matching, which is limited by the stereo camera baseline. In contrast to stereo matching based methods, we present an approach to reconstruct three-dimensional object trajectories combining temporal adjacent views for object point triangulation. We track twodimensional object shapes on pixel level exploiting instance aware semantic segmentation techniques and optical flow cues. We apply Structure from Motion (SfM) to object and background images to determine initial camera poses relative to object instances as well as background structures and refine the initial SfM results by integrating stereo camera constraints using factor graphs. We compute object trajectories using stereo sequence constraints of object and background reconstructions. We show qualitative results using publicly available video data of driving sequences. Due to the lack of suitable ground truth, we create a synthetic benchmark dataset of stereo sequences with vehicles in urban environments. Our algorithm achieves an average trajectory error of 0.09 meter using the dataset. The dataset is on our website publicly available.

No Thumbnail Available
Publication

3D Object Trajectory Reconstruction using Stereo Matching and Instance Flow based Multiple Object Tracking

2019 , Bullinger, Sebastian , Bodensteiner, Christoph , Arens, Michael

This paper presents a method to reconstruct three-dimensional object motion trajectories in stereo video sequences. We apply stereo matching to each image pair of a stereo sequence to compute corresponding binocular disparities. By combining instance-aware semantic segmentation techniques and optical flow cues, we track two-dimensional object shapes on pixel level. This allows us to determine for each frame pair object specifc disparities and corresponding object points. By applying Structure from Motion (SfM) we compute camera poses with respect to background structures. We embed the vehicle trajectories into the environment reconstruction by combining the object point cloud of each image pair with corresponding camera poses contained in the background SfM reconstruction. We show qualitative results on the KITTI and CityScapes dataset and compare our method quantitatively with previously published monocular approaches on synthetic data of vehicles in an urban environment. We achieve an average trajectory error of 0:11 meter.

No Thumbnail Available
Publication

Change Detection and Deformation Analysis based on Mobile Laser Scanning Data of Urban Areas

2020 , Gehrung, Joachim , Hebel, Marcus , Arens, Michael , Stilla, Uwe

Change detection is an important tool for processing multiple epochs of mobile LiDAR data in an efficient manner, since it allows to cope with an otherwise time-consuming operation by focusing on regions of interest. State-of-the-art approaches usually either do not handle the case of incomplete observations or are computationally expensive. We present a novel method based on a combination of point clouds and voxels that is able to handle said case, thereby being computationally less expensive than comparable approaches. Furthermore, our method is able to identify special classes of changes such as partially moved, fully moved and deformed objects in addition to the appeared and disappeared objects recognized by conventional approaches. The performance of our method is evaluated using the publicly available TUM City Campus datasets, showing an overall accuracy of 88 %.

No Thumbnail Available
Publication

RED: A simple but effective Baseline Predictor for the TrajNet Benchmark

2019 , Becker, Stefan , Hug, Ronny , Hübner, Wolfgang , Arens, Michael

In recent years, there is a shift from modeling the tracking problem based on Bayesian formulation towards using deep neural networks. Towards this end, in this paper the effectiveness of various deep neural networks for predicting future pedestrian paths are evaluated. The analyzed deep networks solely rely, like in the traditional approaches, on observed tracklets without human-human interaction information. The evaluation is done on the publicly available TrajNet benchmark dataset [39], which builds up a repository of considerable and popular datasets for trajectory prediction. We show how a Recurrent-Encoder with a Dense layer stacked on top, referred to as RED-predictor, is able to achieve top-rank at the TrajNet 2018 challenge compared to elaborated models. Further, we investigate failure cases and give explanations for observed phenomena, and give some recommendations for overcoming demonstrated shortcomings.

No Thumbnail Available
Publication

A fast voxel-based indicator for change detection using low resolution octrees

2019 , Gehrung, Joachim , Hebel, Marcus , Arens, Michael , Stilla, Uwe

This paper proposes a change detection approach that uses a low-resolution octree enhanced with Gaussian kernels to describe free and occupied space. This so-called Gaussian Occupancy Octree is derived from range measurements and used to represent spatial information for a single epoch. Changes between epochs are encoded using a Delta Octree. A qualitative and quantitative evaluation of the proposed approach shows that its advantages are a fast runtime and the ability to make a statement about the re-exploration of space. An evaluation of the classification accuracy shows that our approach tents towards correct classifications with an overall accuracy of 51.5?%, but is also systematically biased towards the appearance of occupied space.