DiffAnt: Diffusion Models for Action Anticipation

Zhong, Zeyun; Wu, Chengzhi; Martin, Manuel; Voit, Michael; Gall, Jürgen; Beyerer, Jürgen

doi:10.48550/arXiv.2311.15991

2023

Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)

Abstract

Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow. This uncertainty becomes even larger when predicting far into the future. However, the majority of existing action anticipation models adhere to a deterministic approach, neglecting to account for future uncertainties. In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions. In this framework, future actions are iteratively generated from standard Gaussian noise in the latent space, conditioned on the observed video, and subsequently transitioned into the action space. Extensive experiments on four benchmark datasets, i.e., Breakfast, 50Salads, EpicKitchens, and EGTEA Gaze+, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action anticipation. Our code and trained models will be published on GitHub.

Author(s)

Zhong, Zeyun

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Wu, Chengzhi

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Martin, Manuel

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Voit, Michael

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Gall, Jürgen

sl-0

Beyerer, Jürgen

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Options

DiffAnt: Diffusion Models for Action Anticipation