The INRIA-LIM-VocR and AXES submissions for TrecVid 2014 multimedia event detection

Douze, M.; Oneata, D.; Paulin, M.; Leray, C.; Chesneau, N.; Potapov, D.; Verbeek, J.; Alahari, K.; Harchaoui, Z.; Lamel, L.; Gauvain, J.-L.; Schmidt, Christoph Andreas; Schmid, C.

2014

Conference Paper

Abstract

This paper describes our participation to the 2014 edition of the TrecVid Multimedia Event Detection task. Our system is based on a collection of local visual and audio descriptors, which are aggregated to global descriptors, one for each type of low-level descriptor, using Fisher vectors. Besides these features, we use two features based on convolutional networks: one for the visual channel, and one for the audio channel. Additional high-level features are extracted using ASR and OCR features. Finally, we used mid-level attribute features based on object and action detectors trained on external datasets. Our two submissions (INRIA-LIM-VocR and AXES) are identical in terms of all the components, except for the ASR system that is used. We present an overview of the features and the classification techniques, and experimentally evaluate our system on TrecVid MED 2011 data.

Author(s)

Douze, M.

Oneata, D.

Paulin, M.

Leray, C.

Chesneau, N.

Potapov, D.

Verbeek, J.

Alahari, K.

Harchaoui, Z.

Lamel, L.

Gauvain, J.-L.

Schmidt, Christoph Andreas

Schmid, C.

Mainwork

TREC Video Retrieval Evaluation, TRECVID 2014. Notebook papers and slides. Online resource

Conference

TREC Video Retrieval Evaluation Workshop (TRECVID) 2014

Options

The INRIA-LIM-VocR and AXES submissions for TrecVid 2014 multimedia event detection