Programmable Similarity for Record Matching Arvind Arasu, Microsoft Research Record matching, also variously known as entity resolution and deduplication, is an important operation in many search and data analysis applications. The goal of record matching is to determine whether or not two records correspond to the same real-world entity. For example, Citeseer needs to match different citations of the same paper to provide a meaningful search experience. The standard approach to identify matching records is to use textual similarity. However, identifying a good similarity function is something of a black art, and scores of similarity functions ranging from simple edit distance to complex domain specific ones such as Jaro-Winkler distance have been proposed. We propose a radically different approach that relies on "programming" a similarity function. We start with a simple off-the-shelf similarity function and program it using transformation rules to produce a highly customized similarity function. We argue that this approach can better handle complex representational variations (e.g., abbreviations, synonyms) that arise in practice than statically designed similarity functions. We also address performance challenges introduced by this approach. Technology based on this work is currently used in Bing Maps matching pipeline. (Joint work with members of Data Cleaning Research Team at MSR)