CS 545I - Advanced Image and Video Databases, W 96/97

Friday, 17 Jan 1996. Second lecture, "Characterizing an Image for Retrieval"

Oscar Firschein, Visiting Scholar

Seminar abstract

In this lecture we answer questions such as:

What are the basic methods for characterizing the contents of an image?
Why is it extrmely difficult to extract the meaning of an image automatically?
What are the basic issues in content-based retrieval?

Characterizing an Image

An image can be characterized by

manually assigning the image a set of terms describing the image,
automatically assigning a set of measurements related to ome property of the image such as color or texture

The set of terms or of measurements is then used as the basis of image retrieval.

Manually Extracting the Content of an Image

It is possible to describe an image in words and these words can then be use to retrieve the image. For a remarkable example of manual indexing see the San Franscisco deYoung museum web site: http://www.thinker.org. Each image in their extensive collection has a description and some of these descriptions are remarkably detailed: "soldier shooting a gun at saint tied to tree, his crown on ground UL (upper left), small girl saint ascending staircase ...." and this description continues for 60 words!! One can then search using terms used in the description. (This site also uses the IBM QBIC engine for search based on color.)

Automatically Extracting the Content of an Image

A two-dimensional image is represented in the computer as an array of numbers, where each number is the intensity at that point in the image. A color image is represented by three arrays of numbers, one for each color. "Image understanding" is the field of research devoted to automatically deriving descriptions of an image from the intensity arrays stored in the computer.

To determine what the image is "about" we must divide the region into "meaningful" regions. Sometimes this is done by finding edges in the image to delineate the boundaries of regions, and sometimes regions are determined by grouping ixels (pixture elements) having similar textures.

From the regions and/or the edges, one tries to determine the objects in the scene. This can be a difficult problem because objects can be occluded by other objects or by shadows, be articulated (have jointed parts that move), and have a shape that is difficult to describe.

At the present state-of-the-art in image understanding, except for special narrow applications, one cannot automatically label an image with descriptor terms that characterize the contents of the image .

Instead, one assigns a measure to the image, based on color, texture, or some other measure that characterizes an image.

Color or Texture Measures

A color histogram of an image is a plot of the number of occurrences of a particular intensity vs. intensity. The histogram can be used to form a "feature vector" of the image. For a color image, one would use three histograms to characterize the image. Given a query image, the histograms are determined and a feature vector is formed. It is then compares it with the stored set of histogram feature vectors to find an image that is "close" to the query image. To make this approach practical, various special methods are used for efficient represention and comparison of the vectors.

We can also use texture as the basis of feature vector comparison. Typical texture measures are contrast, coarseness, and directionality. A texture-based retrieval system for browsing large-scale aerial photographs has been developed at UC Santa Barbara. Each large airphoto is segmented using texture to obtain a "texture image thesaurus," a cluster technique that facilitates the indexing process. The user can indicate a small region of interest on an airphoto, and the system will retrieve airphotos that have a similar textured region.

If the feature vector is derived for the overall image, we may not get suitable retrieval results. For example, a histogram lacks any information about location in the image. If the query image contains a small colored region of particular importance, the region will get "smeared out" in an overall histogram. Instead, to better represent an image, we may partition the image and obtain a portion of the feature vector for each partition of the image. The user would then indicate a region of interest in the query image, and only that portion of the feature vector would be compared.

Wavelet Representation

An effective method for representing an image is to take the "wavelet" transform of the image. The low-frequency coefficients of the resulting transform that represents objects in the image can then be used as components of the feature vector.

User Interface

Image or video retrieval is best done in an interactive mode. The system displays to the user retrieved "thumbnails" of images that are relevant to the query image, and the user can then iterate by clicking an image that best satisfies the query.

Characterizing a Video

A video is summarized or characterized by selecting the key frames that form the "story board" of the video. Once the key frames are obtained, one can use image retrieval techniques to search a video database for frames that match a query. The story-board frames are obtained by carrying out a frame-to -rame analysis that looks for significant changes, by using the audio track, and by taking advantage of knowledge of how a video is constructed