Content-Based Video Tagging

Only available on StudyMode
  • Download(s) : 144
  • Published : April 15, 2013
Open Document
Text Preview
Content-based Video Tagging for Online Video Portals∗
Adrian Ulges1 , Christian Schulze2 , Daniel Keysers2 , Thomas M. Breuel1 1 University
2 German

of Kaiserslautern, Germany

Research Center for Artificial Intelligence (DFKI), Kaiserslautern {a ulges,tmb}@informatik.uni-kl.de,

{christian.schulze,daniel.keysers}@dfki.de

Abstract
Despite the increasing economic impact of the online video market, search in commercial video databases is still mostly based on user-generated meta-data. To complement this manual labeling, recent research efforts have investigated the interpretation of the visual content of a video to automatically annotate it. A key problem with such methods is the costly acquisition of a manually annotated training set.

In this paper, we study whether content-based tagging can be learned from user-tagged online video, a vast, public data source. We present an extensive benchmark using a database of real-world videos from the video portal youtube.com. We show that a combination of several visual features improves performance over our baseline system by about 30%.

1

Introduction

Due to the rapid spread of the web and growth of its bandwidth, millions of users have discovered online video as a source of information and entertainment. A market of significant economic impact has evolved that is often seen as a serious competitor for traditional TV broadcast. However, accessing the desired pieces of information in an efficient manner is a difficult problem due to the enormous quantity and diversity of video material published. Most commercial systems organize video access and search via meta-data like the video title or user-generated tags (e.g., youtube, myspace, clipfish) – an indexing method that requires manual work and is time-consuming, incomplete, and subjective.

While commercial systems neglect another valuable source of information, namely the content of a video, research in content-based video retrieval strives to automatically annotate (or ‘tag’) videos. Such systems learn connections between low-level visual features and high-level semantic concepts from a training set of annotated videos. Acquiring such a training set manually is costly and poses a key limitation to these content-based systems. ∗

Work supported partially by the Stiftung Rheinland-Pfalz f¨ r Innovation, project InViRe (961-386261/791) u

(a)

(b)

(c)

(d)

(e)

(f)

Figure 1: Some sample keyframes extracted from a video with the tag ’sailing’. Tagging such material is aggravated by the complexity of concepts (a,b), varying appearance (b,c), shots not directly visually linked to sailing (d,e), and low production quality (f). In this paper, we study a different kind of training set, namely videos downloaded from online video portals (a similar idea has been published for images before, learning from Google Image Search [3]). Online videos are publicly available and come in a quantity that is unmatched by any dataset annotated for research purposes, providing a rich amount of tagged video content for a large number of concepts.

On the backside, online video content is extraordinarily difficult to interpret automatically due to several of its characteristics. First, its diversity is enormous: online video is produced world-wide and under various conditions, ranging from private holiday snapshots to commercial TV shows. Second, semantic concepts are often only linked indirectly to the actual visual content. These phenomena are illustrated in Figure 1, which shows keyframes from online videos tagged with the concept ‘sailing’. The visual appearance of frames varies due to several reasons: first of all, the concept itself is so complex that it can be linked to multiple views, like shots of the boat or ocean views (1(a), 1(b)). Second, the appearance varies greatly among shots of the same kind (1(b), 1(c)). Third, there are shots not directly linked to sailing in a visual sense (1(d),1(e)) and garbage frames...
tracking img