Over-complete Representation and Fusion for Semantic Concept Detection (WA-S1)

Author(s) :

Apostol Natsev	(IBM T J Watson Research, USA)
Milind Naphade	(IBM T J Watson Research, USA)
Ching-Yung Lin	(IBM T J Watson Research, USA)
John R. Smith	(IBM T J Watson Research, USA)

Abstract :

Automatic semantic concept detection in images is a promising tool for alleviating the user effort in annotating and cataloging digital media collections. It enables the automatic identification of people, places, and objects, for enhanced indexing and searching of home photographs, for example. While constructing robust semantic detectors has been shown feasible for global generic concepts with a sufficient number of good training examples (e.g., indoors, outdoors), many interesting concepts, such as face, people, and so on, occur at sub-picture granularity, occupy only a portion of the image and therefore frequently have training examples with a reduced signal-to-noise ratio. Such regional concepts are also harder to detect due to imperfections in automatic image segmentation algorithms leading to inaccurate object boundaries and corresponding low-level feature ambiguities. In this paper we focus on the problem of boosting detection performance of existing regional concept detectors by exploiting detection redundancy. Specifically, we propose to use the same detector multiple times to evaluate and combine multiple detection hypotheses for the same content--but at different content granularities--in order to reduce detection sensitivity to segmentation errors. We validate the approach using Support Vector Machine classifiers for 27 regional semantic concepts from the NIST TRECVID 2003 common annotation lexicon, and show significant performance improvements by evaluating and fusing detection results at 4 different granularities.

Menu