Gates Building, Room 100, 3:15-4:30

7Mar1996

joint work with Carlo Tomasi and Leonidas J. Guiba

(submitted to the IEEE for possible publication in CVPR'97)

{rubner,guibas,tomasi}@cs.stanford.edu

**Abstract**

We present a novel approach to the problem of navigating through a database of color images. We consider the images as points in a metric space in which we wish to move around so as to locate image neighborhoods of interest, based on color information. The data base images are mapped to distributions in color space, these distributions are appropriately compressed, and then the distances between all pairs I, J of images are computed based on the work needed to rearrange the mass in the compressed distribution representing I to that of J. This is the concept of the `earth mover's distance', which we argue has many desirable properties. When a query image is specified or drawn by the user, we locate and display its nearest neighbors in the database. The result images can then be used for further queries. We use multi-dimensional scaling (MDS) techniques to embed a group of images as points in two- or three-dimensional Euclidean space so that their distances are preserved as much as possible.

**Color Signatures**

The color information of each image is reduced to a compact representation
that we call the *signature* of the image. In general, the signature
contains a varying number of points in a Euclidean space where a weight
is attached to each point. In the case of color images, the points represent
clusters of similar colors and the weight of each point is the fraction
of the image area with that color.

To compute the signature of a color image, we first slightly smooth each band of the image's RGB representation to reduce possible color quantization and dithering artifactrs. We then transform the image into the CIE-LAB color space [G. Wyszecki and W.S. Styles, "Color Science: Concepts and Methods, Wiley, NY, 1982]. We coalesce this distribution into clusters of similar colors using a novel two-stage algorithm based on a k-d tree.

The signatures thus obtained are compact: the color distribution of an entire image is summarized by a handful of points, typically eight to twelve. Because of the clustering algorithm used, signatures represent well the color distribution of the image. Since signatures represent distributions in the CIE-LAB color space, they are perceptually significant, in that Euclidean distances between points are strongly correlated with percepual differences.

**Distance Between Color Signatures**

We define the distance between two signatures to be the minimum amount
of 'work' needed to transform one signature into the other. The work needed
to move a point, or a fraction of a point, to a new location is the portion
of the weight being moved, multiplied by the Euclidean distance between
the old and the new locations. When changing one signature to another,
the work is the sum of the work done by moving the weights of the individual
points of the source signature to those of the destination siignature.
We allow the weight of a single source signature point to be partitioned
among several destination signature points, and vice versa. We call this
distance function the *earth mover's distance.* The computation is
formalized as a linear programming problem.

**Database Visualization**

We also propose the use of multi-dimensional scaling (MDS) techniques to embed a group of images as points in a two- or three-dimensional Euclidean space so that their distances are preserved as much as possible. The MDS problem is: Given a set of n objects together with a matrix of the distances between each pair, to find a set of n points in d-dimensional space whose distances between pairs of points are as close as possible to the original distances.

For example, given a set of cities and the distances between each pair, it is possible using MDS techniques to derive a two-dimensional map of the cities.

Such geometric embeddings allow the user to perceive the dominant axes of variation in the displayed image group. In particular, displays of 2-d MDS embeddings can be used to organize and refine the results of a nearest-neighbor query in a perceptually intuitive way. By iterating this process, the user is able to quickly navigate to the portion of the image space of interest.

Using our color metric, it is possible to display an entire image database in either two- or three- dimensional space.