Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

VOCUS. A visual attention system for object detection and goal-directed search

: Frintrop, S.
: Cremers, A.B.; Hertzberg, J.


Berlin: Springer, 2006, XII, 216 pp.
Zugl.: Bonn, Univ., Diss., 2005
Lecture Notes in Artificial Intelligence, 3899
ISBN: 3-540-32759-2
ISBN: 978-3-540-32759-2
Fraunhofer IAIS ()
Visual attention; computer vision; robotic perception

Visual attention is a mechanism in human perception which selects relevant regions from a scene and provides these regions for higher-level processing as ob ject recognition. This enables humans to act effectively in their environment despite the complexity of perceivable sensor data. Computational vision systems face the same problem as humans: there is a large amount of information to be processed and to achieve this efficiently, maybe even in real-time for robotic applications, the order in which a scene is investigated must be determined in an intelligent way. A promising approach is to use computational attention systems that simulate human visual attention. This monograph introduces the biologically motivated computational attention system VOCUS (Visual Ob ject detection with a CompUtational attention System) that detects regions of interest in images. It operates in two modes, in an exploration mode in which no task is provided, and in a search mode with a specified target. In exploration mode, regions of interest are defined by strong contrasts (e.g., color or intensity contrasts) and by the uniqueness of a feature. For example, a black sheep is salient in a flock of white sheep. In search mode, the system uses previously learned information about a target ob ject to bias the saliency computations with respect to the target. In various experiments, it is shown that the target is on average found with less than three fixations, that usually less than five training images suffice to learn the target information, and that the system is mostly robust with regard to viewpoint changes and illumination variances. Furthermore, we demonstrate how VOCUS profits from additional sensor data: we apply the system to depth and reflectance data from a 3D laser scanner and show the advantages that the laser modes provide. By fusing the data of both modes, we demonstrate how the system is able to consider distinct ob ject properties and how the flexibility of the system increases by considering different data. Finally, the regions of interest provided by VOCUS serve as input to a classifier that recognizes the ob ject in the detected region. We show how and in which cases the classification is sped up and how the detection quality is improved by the attentional front-end. This approach is especially useful if many ob ject classes have to be considered, a frequently occurring situation in robotics. VOCUS provides a powerful approach to improve existing vision systems by concentrating computational resources to regions that are more likely to contain relevant information. The more the complexity and power of vision systems increase in the future, the more they will profit from an attentional front-end like VOCUS.