Semantics-sensitive Integrated Matching for Picture Libraries
and Biomedical Image Databases

James Z. Wang
Stanford University, Stanford, CA 94305

The need for efficient content-based image retrieval has increased tremendously in many application areas such as biomedicine, military, commerce, education, and Web image classification and searching. In the biomedical domain, content-based image retrieval can be used in patient digital libraries, clinical diagnosis, searching of 2-D electrophoresis gels, and pathology slides. In this thesis, we present a wavelet-based approach for feature extraction, combined with integrated region matching. An image in the database, or a portion of an image, is represented by a set of regions, roughly corresponding to objects, which are characterized by color, texture, shape, and location. A measure for the overall similarity between images is developed as a region-matching scheme that integrates properties of all the regions in the images. The advantage of using such a ``soft matching'' is that it makes the metric robust to poor segmentation, an important property that previous work has not solved. An experimental image retrieval system, SIMPLIcity (Semantics-sensitive Integrated Matching for Picture LIbraries), has been built to validate these methods on various image databases, including a database of about 200,000 general-purpose images and a database of more than 70,000 pathology image fragments. We have shown that our methods perform much better and much faster than existing methods. The system is exceptionally robust to image alterations such as intensity variation, sharpness variation, intentional distortions, cropping, shifting, and rotation. These features are important to biomedical image databases because visual features in the query image are not exactly the same as the visual features in the images in the database. The work has also been applied to the classification of on-line images and web sites.

Full Paper in Color
(PDF, 10MB)
(PostScript, 22MB in gzip)

On-line Demo

Copyright James Z. Wang, 2000.

Last Modified: August 4, 2000