Clindex: Clustering for Similarity Queries in High-Dimensional Spaces Chen Li, Edward Chang, Hector Garcia-Molina James Ze Wang and Gio Wiederhold Department of Computer Science, Stanford University Abstract In this paper we present a clustering and indexing paradigm (called Clindex) for highdimensional search spaces. The scheme is designed for approximate searches, where one wishes to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and performs significantly better than other approaches. Our scheme is based on finding clusters, and then building a simple but efficient index for them. We analyze the tradeoffs involved in clustering and building such an index structure, and present experimental results based on a 30,000 image database. Keywords: similarity search, multidimensional indexes.