Learning-based Linguistic Indexing of Pictures
with 2-D MHMMs
James Z. Wang, Jia Li
The Pennsylvania State University, University Park, PA 16802
Automatic linguistic indexing of pictures is an important but highly
challenging problem for researchers in computer vision and
content-based image retrieval. In this paper, we introduce a
statistical modeling approach to this problem. Categorized images are
used to train a dictionary of hundreds of concepts automatically based
on statistical modeling. Images of any given concept category are
regarded as instances of a stochastic process that characterizes the
category. To measure the extent of association between an image and
the textual description of a category of images, the likelihood of the
occurrence of the image based on the stochastic process derived from
the category is computed. A high likelihood indicates a strong
association. In our experimental implementation, the ALIP (Automatic
Linguistic Indexing of Pictures) system, we focus on a particular
group of stochastic processes for describing images, that is, the
two-dimensional multiresolution hidden Markov models (2-D MHMMs). We
implemented and tested the system on a photographic image database of
600 different semantic categories, each with about 40 training images.
Tested using 3,000 images outside the training database, the system
has demonstrated good accuracy and high potential in linguistic
indexing of these test images.
Full Paper in Color
(high resolution PDF, 6MB)
James Z. Wang and Jia Li, ``Learning-Based Linguistic Indexing of
Pictures with 2-D MHMMs,'' Proc. ACM Multimedia, pp. 436-445, Juan Les
Pins, France, ACM, December 2002.
Copyright 2002 ACM.
Personal use of this
material is permitted. However, permission to reprint/republish this
material for advertising or promotional purposes or for creating new
collective works for resale or redistribution to servers or lists, or
to reuse any copyrighted component of this work in other works, must
be obtained from the ACM.
July 20 2002