Omar Benjelloun

 Phone:  (650) 725-4802

 Address: Department of Computer Science
          Stanford University
          Gates Hall 4A, Room 433
          Stanford, CA  94305-9040  USA


Starting October 2006, I will be working at Google.

Since October 2004, I am a postdoc in the InfoLab at Stanford University (formerly known as the DB Group), until October 2006. I am involved with the SERF and TRIO projects.

I defended my PhD thesis on September, 21st 2004 at Paris XI university, Orsay, France, after having a great time in the INRIA Gemo group, working on Active XML.


My general field of interest is data management and integration. As more and more aspects of human activity have digital counterparts, traditional DBMS systems are not sufficient to address today's data management needs. Issues such as large scale distribution, decentralization, heterogeneity, dynamicity or uncertainty require new data models and systems to support them:

  • Large-scale distribution and decentralized integration: Traditional approaches to data integration involve managing schemas and mappings between them. Active XML proposes an alternative approach, by allowing the integration to happen directly in the data, (just like hyperlinks on the Web). External data sources are encapsulated as Web services. Calls to these Web services can be embedded in the data, "materialized" by calling the Web services, or exchanged between systems. The system is now available under an open source license (LGPL).
  • Uncertainty and lineage: More and more applications manipulate data that is uncertain, and this uncertainty has to do with where data from (its lineage). In TRIO, our goal is to propose a simple and usable representation for uncertain data and its lineage, by extending the relational model, and to build a system that supports this language.
  • Entity Resolution: An important issue in data integration is that the same "real-life entities" can be referred to in different ways in data records. In SERF, we propose a generic framework for "resolving" entities, i.e., identifying records that represent the same entities, and merging them. In our approach, the functions that compare records and merge them are black-boxes. Properties of these functions are leveraged to obtain efficient entity resolution algorithms.

Publications (by topic)

Publications are given in reverse chronological order of appearance. Papers under review appear as technical reports. You can also check out my DBLP entry.


Omar Benjelloun, Anish Das Sarma, Alon Halevy, and Jennifer Widom: ULDBs: Databases with Uncertainty and Lineage, to appear in VLDB'06 (available here).

Parag Agrawal, Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Shubha Nabar, Tomoe Sugihara, Jennifer Widom: Trio: A System for Data, Uncertainty, and Lineage, demo at VLDB'06.

Omar Benjelloun, Anish Das Sarma, Chris Hayworth, and Jennifer Widom: An Introduction to ULDBs and the Trio System, IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1), March 2006 (available here).

Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom: Working Models for Uncertain Data, ICDE 2006 (available here).


Omar Benjelloun, Hector Garcia-Molina, Hideki Kawai, Tait Larson, David Menestrina, Sutthipong Thavisomboon: D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution, Technical Report, 2006 (available here).

David Menestrina, Omar Benjelloun, Hector Garcia-Molina: Generic Entity Resolution with Data Confidences, First Int'l VLDB Workshop on Clean Databases (2006) (Technical report available here).

Omar Benjelloun, Hector Garcia-Molina, Jeff Jonas, Qi Su, Jennifer Widom:Swoosh: A Generic Approach to Entity Resolution, Technical Report, 2005 (available here).

Active XML

Serge Abiteboul, Omar Benjelloun, Tova Milo: The Active XML Project, an Overview, Technical Report, 2005 (pdf).

Serge Abiteboul, Tova Milo, Omar Benjelloun: Regular and Unambiguous Rewritings for Active XML, PODS 2005 (pdf).

Tova Milo, Serge Abiteboul, Bernd Amann, Omar Benjelloun, Fred Dang Ngoc: Exchanging Intensional XML Data (Extended version) ACM TODS, 2005 (pdf)

Serge Abiteboul, Omar Benjelloun, Bogdan Cautis, Irini Fundulaki, Tova Milo, Arnaud Sahuguet: An Electronic Patient Record on Steroids : Distributed, Peer to Peer, Secure and Privacy Conscious (demo) VLDB 2004 (pdf).

Serge Abiteboul, Omar Benjelloun, Tova Milo: Positive Active XML, PODS 2004 (pdf).

Serge Abiteboul, Omar Benjelloun, Bogdan Cautis, Ioana Manolescu, Tova Milo, Nicoleta Preda: Lazy Query Evaluation for Active XML, SIGMOD 2004 (pdf).

Serge Abiteboul, Omar Benjelloun, Tova Milo, Ioana Manolescu, Roger Weber: Active XML: A Data-Centric Perspective on Web Services, Book chapter, In Web Dynamics, Springer, to appear in March 2004.

Serge Abiteboul, Bernd Aman, Jerome Baumgarten, Omar Benjelloun, Frederic Dang Ngoc, Tova Milo: Schema-driven Customization of Web Services, VLDB 2003 (demo) (pdf).

Tova Milo, Serge Abiteboul, Bernd Amann, Omar Benjelloun, Frederic Dang Ngoc: Exchanging Intensional XML Data, SIGMOD 2003 (ps).

Serge Abiteboul, Omar Benjelloun, Tova Milo: Web Services and Data Integration, WISE 2002 (pdf).

Serge Abiteboul, Omar Benjelloun, Ioana Manolescu, Tova Milo and Roger Weber: Active XML: Peer-to-Peer Data and Web Services Integration. VLDB 2002 (demo) (ps).

Serge Abiteboul, Omar Benjelloun, Tova Milo: Towards a Flexible Model for Data and Web Services Integration, FMII workshop, 2001.

Teaching - Enseignement

In the summer 2005, I co-taught CS245 - Database System Principles, at Stanford, with Mor Naaman.

During my PhD I was also a T.A. at Paris XI University, Orsay. I mainly taught labs on databases, data integration and web programming, for undergraduate students.

Pendant ma thèse, j'ai aussi été moniteur à l'université Paris XI, Orsay. Mon tuteur d'enseignement était Emmanuel Waller.

Année scolaire 2001-2002:

  • IUP3BD2: Bases de données II, TD en IUP3 (Enseignant: E. Waller).
  • CFA320: Intégration d'informations hétérogènes, TD en CFA2 (Enseignant: N. Spyratos).

Année scolaire 2002-2003:

  • TC508: Gestion de l'information sur l'internet, TD en DESS SCHM/II (Enseignant: P. Rigaux).
  • IUP3BD2: Bases de données II, TD en IUP3 (Enseignant: E. Waller).
  • S2-Pascal: Pascal - Approche impérative, TP en DEUG MIAS (Enseignants: F. Lefevre, B. Safar)
  • DEA III: Cours d'introduction au services web, dans le cadre du cours de S. Abiteboul: Données semi-structurées (transparents pdf).

Année scolaire 2003-2004:

  • F3BD2: Bases de données II, TD en FIIFO 3 (Enseignant: E. Waller).
  • DEA III: Données semi-structurées, avec Serge Abiteboul, Benjamin N'Guyen et Ioana Manolescu.


A few places where I lived / studied / worked - Quelques endroits ou j'ai vécu / étudié / travaillé :

