Stanford Global InfoBase

Project Summary

Our work is driven by the vision of a Global InfoBase (GIB): a ubiquitous and universal information resource, simple to use, up to date, and comprehensive. The project consists of four interrelated thrusts:

(i) Combining Technologies: integrating technologies for information retrieval, database management, and hypertext navigation, to achieve a "universal" information model;

(ii) Personalization: developing tools for personalizing information management;

(iii) Semantics: Using natural-language processing and structural techniques for analyzing the semantics of Web pages; and

(iv) Data Mining: designing new algorithms for mining information in order to synthesize new knowledge.

People

Faculty

Students (full-time and part-time, grad and undergrad)

Alums

Papers

Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Extrapolation Methods for Accelerating PageRank Computations. Submitted to WWW2003.
Sriram Raghavan and Hector Garcia-Molina. Integrating Diverse Information Management Systems: A Brief Survey. Proceedings of the IEEE Data Engineering Bulleting, December 2001.
Cheng Yang. "Music Database Retrieval Based on Spectral Similarity." In International Symposium on Music Information Retrieval, October 2001.
T. Haveliwala. Search Facilities for Internet Relay Chat. To appear in Proceedings of the Joint Conference on Digital Libraries (Poster session), 2002.
C. Olston and J. Widom. Best-Effort Cache Synchronization with Source Cooperation. ACM SIGMOD 2002.
C. Olston and J. Widom. Approximate Caching for Continuous Queries over Distributed Data Sources . February 2002 Technical Report.
C. Olston, B. T. Loo and J. Widom. Adaptive Precision Setting for Cached Approximate Values. ACM SIGMOD 2001. International Conference on Management of Data, May 2001.
D. Klein and T. Haveliwala. Concise Labeling of Document Clusters. Submitted. Technical Report, Stanford University, April 2002.
Sriram Raghavan and Hector Garcia-Molina. Crawling the hidden Web. Proceedings of the 27th Intl. Conf. on Very Large Databases (VLDB), pp. 129-138, September 2001.
T. Haveliwala. Topic-Sensitive PageRank. Proceedings of the Eleventh International World Wide Web Conference, 2002.
T. Haveliwala, A. Gionis, D. Klein, and P. Indyk. Evaluating Strategies for Similarity Search on the Web. Proceedings of the Eleventh International World Wide Web Conference, 2002.
D. Klein, S. Kamvar, and C. Manning. From Instance-level Constraints to Space-level Constraints: Making the Most of Prior Knowledge in Data Clustering. Proceedings of the Nineteenth International Conference on Machine Learning, 2002.
S. Kamvar, D. Klein, and C. Manning. Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based Approach. Proceedings of the Nineteenth International Conference on Machine Learning, 2002.
Glen Jeh and Jennifer Widom. SimRank: A Measure of Structural-Context Similarity. Technical Report, Computer Science Department, Stanford University, 2001.
Glen Jeh and Jennifer Widom. Scaling Personalized Web Search. Technical Report, Computer Science Department, Stanford University, 2002.

WWW

Sites relevant to the project include: DB Group home page, Infolab home page, NLP Group home page, Digital Libraries project home page.

Reports

Progress Report, 2002

Progress Report, 2001

Last modified: Jan. 22 2003