Searching Near-Replicas of Images via Clustering Edward Chang, Chen Li, James Wang, Peter Mork and Gio Wiederhold Department of Computer Science, Stanford University echang,chenli,wangz,pmork,giog@cs.stanford.edu Abstract Internet piracy has been one of the major concerns for Web publishing. In study we present a system, RIME, that we have prototyped for detecting unauthorized image copying on the World-Wide Web. To speed up the copy detection, RIME uses a new clustering/hashing approach that first clusters similar images on adjacent disk cylinders and then builds indexes to access the clusters made in this way. Searching for the replicas of an image often takes just one IO to look up the location of the cluster containing similar objects and one sequential file IO to read in this cluster. Our experimental results show that RIME can detect image copies both more efficiently and effectively than the traditional content-based image retrieval systems that use tree-like structures to index images. In addition, RIME copes well with image format conversion, resampling, requantization and geometric transformations. Keywords: clustering, copy detection, multidimensional indexes, similarity search.