A Generic Mid-level Representation for Semantic Video Analysis (MP-L2)

Author(s) :

Qing Tang	(School of Information Technologies, University of Sydney, Australia)
Joo-Hwee Lim	(Institute for Infocomm Research, Singapore)
Jesse S. Jin	(School of Information Technologies, University of Sydney, Australia)
Haiping Sun	(Institute for Infocomm Research, National University of Singapore, Singapore)
Qi Tian	(Institute for Infocomm Research, Singapore)

Abstract :

This paper presents a generic mid-level representation for efficient semantic video analysis, which adopts a P frame-by-frame scheme rather than shot-based schemes. Each P frame is partitioned into a m (row) by n (column) grid, and each cell is called a ‘block’. The representation can bridge the semantic gap and build an intermediate description of video features across frames and blocks. Soccer video is used to showcase the potential of the framework for real video processing. In addition, experiments with tennis video and news video have also been conducted. Results demonstrate the excellent performance of the framework in semantic analysis and also indicate its further potential for automatic video analysis.

Menu