Image Retrieval with Textual Label Similarity Features

Date01 January 2015
DOIhttp://doi.org/10.1002/isaf.1364
Published date01 January 2015
AuthorAlicia Sagae,Scott E. Fahlman
IMAGE RETRIEVAL WITH TEXTUAL LABEL SIMILARITY
FEATURES
ALICIA SAGAE*AND SCOTT E. FAHLMAN
Language Technologies Institute, CarnegieMellon University, Los Angeles, CA, USA
SUMMARY
This article presents a knowledge-based solution for retrieving English descriptions of images. We analyse the er-
rors made by a baseline system that relies on term frequency, and we nd that the task requires deeper semantic
representation. Our solution is to perform incremental, task-driven development of an ontology. Ontological
features are then applied in a machine-learning algorithm for ranking candidate image descriptions. This work
demonstrates the advantage of combining knowledge-based and statistical approaches for text retrieval, and it es-
tablishes the important result that an empirically tuned task-specic ontology performs better than a domain-
general resource like WordNet, even on previously unseen examples. Copyright © 2015 John Wiley & Sons, Ltd.
Keywords: image retrieval; textual similarity; textual inference
1. INTRODUCTION
This paper describes experiments to retrieve images based on matching their descriptive English
labels. As in ad-hoc document retrieval, a baseline system using term vectors to represent these
labels performs reasonably well (>80% mean reciproca l rank (MRR)). However, erro r analysis
reveals that the most challenging examples for this task require a richer feature space, allowing
the system to capture more of the deep semantic similarities that humans seem to notice when
they make comparisons between images and their descriptions. As a result, we present a solution
that uses knowledge-based features for identifying when two English descriptions refer to the
same image.
Object labels assigned by humans typically consist of short, multiword phrases. These phrases ex-
hibit syntactic and semantic structure that is not always modelled by information retrieval systems.
Nonetheless, humans rely on this structure, along with background knowledge, when generating and
interpreting labels. These characteristics place our task in the class of applied textual inference
(ATI) problems. ATI tasks depend on some level of text understanding and background knowledge,
but they are designed to abstract away from system-specic representational choices. They include
summarization, question answering and recognizing textual entailment, among other problems.
Image-identity is a relation that holds between two texts, A and B, when they refer to the same
image. In our current work, we focus on the ATI problem of recognizing image-identity between
a description that serves as a query to a retrieval system and a description that labels a known
image in a collection.
* Correspondence to: Alicia Sagae, Language Technologies Institute, Carnegie Mellon University, Los Angeles, CA, USA.
E-mail: atribble@cs.cmu.edu
Copyright © 2015 John Wiley & Sons, Ltd.
INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE AND MANAGEMENT
Intell. Sys. Acc. Fin. Mgmt. 22, 101113 (2015)
Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/isaf.1364

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT