James Z. Wang
Computer Science and Biomedical Informatics
Stanford University, Stanford, CA 94305
In this demonstration, we present SIMPLIcity, an image retrieval system for picture libraries and biomedical image databases. The system uses a wavelet-based approach for feature extraction, real-time region segmentation, the Integrated Region Matching (IRM) metric, and image classification methods. Tested on large-scale picture libraries and a database of pathology images, the system has demonstrated accurate and fast retrieval. It is also exceptionally robust to image alterations.
The need for efficient content-based image database retrieval has increased tremendously in many application areas such as biomedicine, military, commercial, education, and Web image classification and searching. In the biomedical domain, content-based image retrieval can be used in patient digital libraries, clinical diagnosis, searching of 2-D electrophoresis gels, and pathology slides. In this demonstration, we present a wavelet-based approach for feature extraction, combined with the Integrated Region Matching (IRM) metric  and image classification methods.
An image in a general-purpose picture library, or a portion of an image in a biomedical image database, is represented by a set of regions, roughly corresponding to objects, which are characterized by color, wavelet-based features, shape, and location. A measure for the overall similarity between images is developed by a region-matching scheme that integrates properties of all the regions in the images. The advantage of using such a ``soft matching'' is that it makes the metric robust to poor segmentation, an important property that previous work has not solved. High-level image classification methods [3,6] have been developed and used to categorize images so that semantically-adaptive searching methods can be applied to each category. Figure 1 shows the architecture of the feature indexing process.
Figure 1: The architecture of feature indexing process. The heavy lines show a sample indexing path of an image.
In this demonstration, we show an experimental image retrieval system, SIMPLIcity (Semantics-sensitive Integrated Matching for Picture LIbraries), built to validate these methods on various image databases, including a database of about 200,000 general-purpose images and a database of more than 70,000 pathology image fragments. We demonstrate that our methods perform much better and much faster than existing methods such as the EMD-based color histogram matching  and the WBIIS system based on the Daubechies' wavelets . The system has a friendly user interface which is capable of processing a query based on an outside image or a hand-drawn sketch in real-time.
Region-based retrieval systems typically have complicated user interfaces . The IRM metric enables us to design simple but capable query interfaces for region-based systems. The current implementation of the SIMPLIcity system provides several query interfaces: a CGI-based Web access interface, a JAVA-based drawing interface, a CGI-based Web interface for submitting a query image of any format anywhere on the Internet.
This interface is written in CGI and is designed for accessing images in the database with a query image from the database. The user may select a random set of images from the database to start with and click on an image in the window to form a query. Or, the user may enter the ID of an image as the query.
If the user moves the mouse on top of a thumbnail shown in the window, the thumbnail will be automatically changed to its region segmentation and each region is painted with its representing color. This feature is important for partial region matching. For example, the user may choose a subset of the regions of an image to form a query, rather than using all the regions in the query image.
Figure 2: The JAVA drawing query interface allows users to draw sketch queries.
We have developed a JAVA-based drawing interface (Figure 2) for users to make free hand sketch queries. We allow users to draw sketches, straight lines, polygons, rectangles, and eclipses. A 24-bit color palette is provided on the interface for users to choose a representing color for each region or line drawn. We are exploring ways to specify desired textures.
Figure 3: The outside query interface. The best 17 matched images are presented for a query image selected by the user from the Stanford front Web page. The user enters the URL of the query image (shown in the upper-left corner) to establish a query. Database size: 200,000 images.
We allow the user to submit any images on the Internet as a query image to the system by entering the URL of an image (Figure 3). Our system is capable of handling any image format from anywhere on the Internet and reachable by our server via the HTTP protocol. The image is downloaded and processed by our system on-the-fly. The high efficiency of our image segmentation and matching algorithms made this feature possible.
This work was supported in part by the National Science Foundation Grant No. IIS-9817511. It is a joint work with Jia Li and Gio Wiederhold. The author would like to thank the generous help of Desmond Chan, Oscar Firschein, Donald Regula, and Xin Wang.