Options
2005
Conference Paper
Title
A goal-directed visual attention system
Title Supplement
Abstract
Abstract
Suppose you are looking for your key. You know it to be somewhere on your desk but it still takes you several fixations until your roaming view hits the key. If you have a salient key fob contrasting with the desk, you will detect the key faster. This is according to the separation of visual processing into two subtasks: First, a fast parallel attention system detects object candidates and, second, complex recognition restricted to these regions verifies the hypothesis. The focus of this work is on the first task, i.e., on the pre-selection of object candidates with a computational attention system. In human perception, the focus of attention is guided by two factors: bottom-up attention detects regions standing out from the rest of the scene, for example a black sheep among white ones, and top-down attention guides the view according to knowledge, expectations and goals. Existing computer models of visual attention focus mainly on the bottom-up aspect. Here, we present a robust computational attention system that enables goal-directed search. A standard bottom-up architecture [1] based on the Feature Integration Theory [2] is extended by a top-down component, enabling the weighting of features depending on previously learned weights. In learning mode, the region of the target is provided by the user in a user-friendly way by drawing a rectangle around the target with the mouse. The system determines autonomously which region inside the rectangle is most salient and used this region for learning. The learned weights consider not only the properties of the target (excitation) but also of the background (inhibition). In search mode, the system uses the learned weights to excite or inhibit the features in the scene and directs the focus to the region which most likely contains the target. Detailed performance results are presented on artificial images as well as on a wide variety of real-world images. The target is typically among the first 3 focused regions making the system a robust and time-saving front-end for object recognition.