Image Retrieval with Textual Label Similarity Features
Date | 01 January 2015 |
DOI | http://doi.org/10.1002/isaf.1364 |
Published date | 01 January 2015 |
Author | Alicia Sagae,Scott E. Fahlman |
IMAGE RETRIEVAL WITH TEXTUAL LABEL SIMILARITY
FEATURES
ALICIA SAGAE*AND SCOTT E. FAHLMAN
Language Technologies Institute, CarnegieMellon University, Los Angeles, CA, USA
SUMMARY
This article presents a knowledge-based solution for retrieving English descriptions of images. We analyse the er-
rors made by a baseline system that relies on term frequency, and we find that the task requires deeper semantic
representation. Our solution is to perform incremental, task-driven development of an ontology. Ontological
features are then applied in a machine-learning algorithm for ranking candidate image descriptions. This work
demonstrates the advantage of combining knowledge-based and statistical approaches for text retrieval, and it es-
tablishes the important result that an empirically tuned task-specific ontology performs better than a domain-
general resource like WordNet, even on previously unseen examples. Copyright © 2015 John Wiley & Sons, Ltd.
Keywords: image retrieval; textual similarity; textual inference
1. INTRODUCTION
This paper describes experiments to retrieve images based on matching their descriptive English
labels. As in ad-hoc document retrieval, a baseline system using term vectors to represent these
labels performs reasonably well (>80% mean reciproca l rank (MRR)). However, erro r analysis
reveals that the most challenging examples for this task require a richer feature space, allowing
the system to capture more of the deep semantic similarities that humans seem to notice when
they make comparisons between images and their descriptions. As a result, we present a solution
that uses knowledge-based features for identifying when two English descriptions refer to the
same image.
Object labels assigned by humans typically consist of short, multiword phrases. These phrases ex-
hibit syntactic and semantic structure that is not always modelled by information retrieval systems.
Nonetheless, humans rely on this structure, along with background knowledge, when generating and
interpreting labels. These characteristics place our task in the class of applied textual inference
(ATI) problems. ATI tasks depend on some level of text understanding and background knowledge,
but they are designed to abstract away from system-specific representational choices. They include
summarization, question answering and recognizing textual entailment, among other problems.
Image-identity is a relation that holds between two texts, A and B, when they refer to the same
image. In our current work, we focus on the ATI problem of recognizing image-identity between
a description that serves as a query to a retrieval system and a description that labels a known
image in a collection.
* Correspondence to: Alicia Sagae, Language Technologies Institute, Carnegie Mellon University, Los Angeles, CA, USA.
E-mail: atribble@cs.cmu.edu
Copyright © 2015 John Wiley & Sons, Ltd.
INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE AND MANAGEMENT
Intell. Sys. Acc. Fin. Mgmt. 22, 101–113 (2015)
Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/isaf.1364
To continue reading
Request your trial