The effect of semantic relatedness measures on multi-label classification evaluation
In this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures based on web-search engines, WordNet, Wikipedia and Flickr to the ontology-based score (OS) proposed in . The final objective is to assess the benefit of integrating semantic distances to the OS measure. Hence, we have evaluated them in a real case scenario: the results (73 runs) provided by 19 research teams during their participation in the ImageCLEF 2009 Photo Annotation Task. Two experiments were conducted with a view to understand what aspect of the annotation behaviour is more effectively captured by each measure. First, we establish a comparison of system rankings brought about by different evaluation measures. This is done by computing the Kendall ? and Kolmogorov-Smirnov correlation between the ranking of pairs of them. Second, we investigate how stable the different measures react to artificially introduced noise in the ground truth. We conclude that the distributional measures based on image information sources show a promising behaviour in terms of ranking and stability.