Advanced Image & Video Databases, CS545I, Winter 2001

Friday, January 26. Extracting image Features for Image Indexing: Computational Metrics for Color, Texture, and Shape,

Oscar Firschein, Visiting Scholar, Stanford Computer Science Department

An image can be indexed and compared to another image on the basis of color, texture, shape of objects, or spatial frequency. Matching images on the basis of color has been the major technique used in image retrieval systems.


Color would seem to be a simple topic: each pixel of an image has red, green and blue (RGB) intensity values. However, while RGB is the most commonly used hardware-oriented scheme for digital images, there are other color spaces based on how human observers perceive color. Uniform color spaces are spaces such that a color difference perceived by a human observer is approximated as the Euclidean distance between two points in the color space. Such spaces, obtained from RGB through a simple transformation, are more suited than RGB for image retrieval applications.

The color distribution of an image can be represented by a histogram each of whose bins corresponds to a range of values for each of the color components. A color histogram provides only a very coarse characterization of an image; images with similar histograms can have dramatically different semantic content. To compare two images, we can compare the color histograms of the respective images by enumerating for each corresponding bucket the number of pixels that are common to both histograms. This value may then be normalized by the total number of pixels in one of the two histograms. (When each color is represented by 16 bins, we would have to perform 4096 comparisons.) There are many other schemes for histogram comparison.

To obtain a more efficient computation, the pixels of an image can be clustered on the basis of color and only the major clusters retained. The remaining clusters are used as the buckets in a one-dimensional histogram which is then the basis for histogram comparison.

To improve retrieval performance, we may divide the image into regions and compare the histograms of the regions individually. Another approach, histogram refinement, splits the pixels within a given histogram bucket into classes based on some local property. Split histograms are compared on a bucket by bucket basis, similar to standard histogram matching. Within a given bucket, only pixels with the same property are compared. A simple histogram refinement can be based on position in the image, where each pixel in a given color bucket is classified as either in a particlar location of the image of the image or not.

Another approach is a split histogram called a color coherence vector that partitions pixels based on their spatial coherence. A coherent pixel is part of some sizeable contiguous region, while an incoherent pixel is not. The image is first blurred and then the colorspace is discretized such that there are only n distinct colors in the image. Pixels within a given color bucket are classified as to either coherent or incoherent depending on whether it is part of a large group of pixels of the same color or not. Pixel groups are determined by computing connected components. (A connected component C is a maximal set of pixels such that for any two pixels in C there is a path in C between them.)

An alternative to the color histogram approach is the configuration-based approach that encodes a class such as water scene at sunset, snow-capped mountains, or waterfall as a template that specifies a set of salient image regions and salient qualitative relationships between these regions. For a scene of snow-capped mountains the template would be a frame broken into three horizontal regions corresponding to sky, mountains, and foreground, with the spatial, luminosity, and color relationships between adjacent regions specified. A waterfall template would divide the frame into three vertical regions with the spatial and color relationships between the parts specified.


Texture is an innate property of virtually all surfaces and it contains important information about the structural arrangement of surfaces. Whereas color is a point property, texture is a local neighborhood property. Texture measures can be used as the basis of feature vectors. While it is possible to use various statistical measures of texture, six texture properties have been found to be visually meaningful to people: contrast, coarseness, directionality, linelikeness, regularity, and roughness. The IBM QBIC system uses three measures of texture, coarseness measures the scale of the texture (pebbles vs. boulders), contrast describes the vividness of the pattern and is a function of the variance of the gray-level histogram, and directionality describes whether or not the image has a favored direction or whether it is isotropic (like a smooth object).

The Blobworld representation of UC Berkeley is created by clustering pixels in a joint color-texture-position feature space. By finding image regions that roughly correspond to objects, querying can be carried out at the level of objects rather than global image properties.


To use the shape of objects as the basis of comparison the objects must be separated from the background. Carrying out this partitioning is not a trivial problem because of shading effects, occlusion of an object by another, and boundaries that fade into the background. Once an object is separated from the background then the shape has to be characterized in a manner that is invariant to object displacement, scale, and rotation. Some shape measures used in the IBM QBIC system are circularity, eccentricity, major axis orientation, and moment invariance.


A Fourier transform represents a signal as a superposition of sinusoids with different frequencies, and the Fourier coefficients measure the contribution of the sinusoids at these frequencies. Similarly, the wavelet transform represents a signal as a sum of wavelets with different locations and scales. The wavelet coefficients essentially quantify the strengths of the contribution of the wavelet at these locations and scales. It is possible to use selected coefficients from image transforms to form a feature vector that represents an image. Comparison of two images is carried out by comparing the two feature vectors.

The University of Washington applies the Haar wavelet to multiresolution image querying. Forty to sixty of the largest magnitude coefficients are selected from the 128 x 128 = 16,384 coefficients in each of the three color channels. The coefficients are stored as +1 or -1 along with their locations in the transform matrix. One drawback of using the Haar transform to decompose images into low and high frequencies is that the Haar transform cannot efficiently separate into high and low frequency bands, and thus cannot separate the image into clean distinct low and high frequency parts.

The Daubechies wavelet transformation has been used by the Stanford image retrieval system. For each color component of a 128 x 128 image, a 128 x 128 matrix is obtained from the Daubechies transform of the color component. The upper left 16 x 16 portion of each matrix, representing detailed information in the original image, is stored as part of the feature vector. The standard deviations of the upper left 8 x 8 corner of each matrix are also stored as part of the feature vector.