Nagel, K.K.NagelNowak, S.S.NowakWolter, K.K.WolterKühhirt, U.U.Kühhirt2022-03-132022-03-132011https://publica.fraunhofer.de/handle/publica/3962372-s2.0-84922031742This paper presents the participation of the Fraunhofer IDMT in the ImageCLEF 2011 Photo Annotation Task. Our approach is focused on text-based features and strategies to combine visual and textual information. First, we apply a pre-processing step on the provided Flickr tags to reduce noise. For each concept, tf-idf values per tag are computed and used to construct a text-based descriptor. Second, we extract RGB-SIFT descriptors using the codebook approach. Visual and text-based features are combined, once with early fusion and once with late fusion. The concepts are learned with SVM classifiers. Further, a post-processing step compares tags and concept names to each other. Our submission consists of one text-only and four multi-modal runs. The results show, that a combination of text-based and visual-features improves the result. Best results are achieved with the late fusion approach. The post-processing step only improves the results for some concepts, while others worsen. Overall, we scored a Mean Average Precision (MAP) of 37.1% and an example-based F-Measure (F-ex) of 55.2%.enThe Fraunhofer IDMT at image CLEF 2011 photo annotation taskconference paper