Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due to the large collections of images available in diﬀerent contexts. One of the main challenges to develop eﬀective image retrieval systems is the automatic identiﬁcation of semantic image contents. This research proposal aims to design a model for image retrieval able to take advantage of diﬀerent data sources, i.e. using multimodal information, to improve the response of an image retrieval system. In particular two data modalities associated to contents and context of images are considered in this proposal: visual features and unstructured text annotations. The proposed framework is based on kernel methods that provide two main important advantages over the traditional multimodal approaches: ﬁrst, the structure of each modality is preserved in a high dimensional feature space, and second, they provide natural ways to fuse feature spaces in a unique information space. This document presents the research agenda to build a Multimodal Information Space for searching images by content.
Juan Carlos Caicedo Rueda
Prof. Fabio A. Gonz´lez O. Ph.D a
Information Retrieval and Machine Learning.
Content-Based Image Retrieval (CBIR) is an active research discipline focused on computational strategies to search for relevant images based on visual content analysis. In this proposal, multimodal analysis is considered to develop CBIR systems, specially for image collections in which there is some text associated to images. Multimodality in Information Retrieval is sometimes referred to the interaction mechanisms and devices used to query the system. However, since the Multimedia Information Retrieval perspective, multimodality is referred to those methods that take advantage of diﬀerent data modalities to provide access to a digital library or a multimedia collection [1, 2]. Diﬀerent data modalities in multimedia are used to better understand document contents, including textual annotations, audio, images and video. In this proposal, multimodal will refer to the ability to represent, process and analyze two data modalities simultaneously: unstructured texts and images. The study of multimodal information retrieval systems is proposed in this document. In particular, the design of computational strategies to take advantage of multimodal interactions between image contents and text descriptions is proposed to improve the response of an image retrieval system. In addition, the evaluation of diﬀerent query paradigms is proposed, including query by example, a keyword based approach and multimodal queries to search for images. A uniﬁed framework is proposed in this document to manage data representation, search algorithms and query resolution. The study and evaluation of kernel methods to generate Multimodal Information Spaces is proposed. How can kernel methods be adapted to address the problems of a multimodal information retrieval system, is one of the main purposes of this research. This proposal aims to approach practical and theoretical aspects of a multimodal information representation for image retrieval systems. Kernel methods provide foundations to include structure in data representation and also to combine diﬀerent heterogeneous data sources. Kernel methods for pattern analysis have been studied to design machine learning algorithms, and have been widely used for non-vectorial data, such as strings, trees and graphs among others . Adapting such a framework for information retrieval, and specially for multimodal information retrieval may lead to more eﬀective systems, and also may contribute to the understanding of the relationships between information retrieval and machine learning.