Gates Building, Room 100, 3:15-4:30

February 9, 2001

Advanced Image & Video Databases, CS545I, Winter 2001

Indexing of Many-Attribute Descriptors

Prof. Carlo Tomasi, Computer Science Department, Stanford University


A new distance between two distributions, called the Earth Mover's Distance (EMD) is introduced. The EMD reflects the minimal amount of work that must be performed to transform one distribution into the other by moving "distribution mass" around. This is a special case of the transportation problem from linear optimization, for which efficient algorithms are available. The EMD also allows for partial matching. When used to compare distributions that have the same overall mass, the EMD is a true metric, and has easy-to-computer lower bounds. In this talk I focus on applications to image databases, especially color and texture. The EMD can be used to exhibit the structure of color-distribution and texture spaces by means of Multi-Dimensional Scaling displays. I also propose a novel approach to the problem of navigating through a collection of color images, which leads to a new paradigm for image database search.

Color Signatures

The color information of each image is reduced to a compact representation that we call the signature of the image. In general, the signature contains a varying number of points in a Euclidean space where a weight is attached to each point. In the case of color images, the points represent clusters of similar colors and the weight of each point is the fraction of the image area with that color.

To compute the signature of a color image, we first slightly smooth each band of the image's RGB representation to reduce possible color quantization and dithering artifactrs. We then transform the image into the CIE-LAB color space [G. Wyszecki and W.S. Styles, "Color Science: Concepts and Methods, Wiley, NY, 1982]. We coalesce this distribution into clusters of similar colors using a novel two-stage algorithm based on a k-d tree.

The signatures thus obtained are compact: the color distribution of an entire image is summarized by a handful of points, typically eight to twelve. Because of the clustering algorithm used, signatures represent well the color distribution of the image. Since signatures represent distributions in the CIE-LAB color space, they are perceptually significant, in that Euclidean distances between points are strongly correlated with percepual differences.

Distance Between Color Signatures

We define the distance between two signatures to be the minimum amount of 'work' needed to transform one signature into the other. The work needed to move a point, or a fraction of a point, to a new location is the portion of the weight being moved, multiplied by the Euclidean distance between the old and the new locations. When changing one signature to another, the work is the sum of the work done by moving the weights of the individual points of the source signature to those of the destination siignature. We allow the weight of a single source signature point to be partitioned among several destination signature points, and vice versa. We call this distance function the earth mover's distance. The computation is formalized as a linear programming problem.

Database Visualization

We also propose the use of multi-dimensional scaling (MDS) techniques to embed a group of images as points in a two- or three-dimensional Euclidean space so that their distances are preserved as much as possible. The MDS problem is: Given a set of n objects together with a matrix of the distances between each pair, to find a set of n points in d-dimensional space whose distances between pairs of points are as close as possible to the original distances.

For example, given a set of cities and the distances between each pair, it is possible using MDS techniques to derive a two-dimensional map of the cities.

Such geometric embeddings allow the user to perceive the dominant axes of variation in the displayed image group. In particular, displays of 2-d MDS embeddings can be used to organize and refine the results of a nearest-neighbor query in a perceptually intuitive way. By iterating this process, the user is able to quickly navigate to the portion of the image space of interest.

Using our color metric, it is possible to display an entire image database in either two- or three- dimensional space.