Our work is driven by the vision of a Global InfoBase (GIB): a ubiquitous and
universal information resource, simple to use, up to date, and comprehensive.
The project consists of four interrelated thrusts:
(i) Combining Technologies: integrating technologies for information
retrieval, database management, and hypertext navigation, to achieve a
"universal" information model;
(ii) Personalization: developing tools for personalizing information
(iii) Semantics: Using natural-language processing and structural techniques
for analyzing the semantics of Web pages; and
(iv) Data Mining: designing new algorithms for mining information in order to
synthesize new knowledge.
Students (full-time and part-time, grad and undergrad)
D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Extrapolation
Methods for Accelerating PageRank Computations. Submitted to WWW2003.
Raghavan and Hector Garcia-Molina. Integrating
Diverse Information Management Systems: A Brief Survey. Proceedings of
the IEEE Data Engineering Bulleting, December 2001.
Cheng Yang. "Music
Database Retrieval Based on Spectral Similarity." In International
Symposium on Music Information Retrieval, October 2001.
T. Haveliwala. Search Facilities for Internet Relay Chat. To appear in
Proceedings of the Joint Conference on Digital Libraries (Poster session),
C. Olston and J. Widom. Best-Effort
Cache Synchronization with Source Cooperation. ACM SIGMOD 2002.
C. Olston and J. Widom. Approximate
Caching for Continuous Queries over Distributed Data Sources . February
2002 Technical Report.
C. Olston, B. T. Loo and J. Widom. Adaptive
Precision Setting for Cached Approximate Values. ACM SIGMOD 2001.
International Conference on Management of Data, May 2001.
D. Klein and T. Haveliwala. Concise Labeling of Document Clusters.
Submitted. Technical Report, Stanford University, April 2002.
Sriram Raghavan and Hector Garcia-Molina. Crawling
the hidden Web. Proceedings of the 27th Intl. Conf. on Very Large
Databases (VLDB), pp. 129-138, September 2001.
T. Haveliwala. Topic-Sensitive
PageRank. Proceedings of the Eleventh International World Wide Web
T. Haveliwala, A. Gionis, D. Klein, and P. Indyk. Evaluating
Strategies for Similarity Search on the Web. Proceedings of the Eleventh
International World Wide Web Conference, 2002.
D. Klein, S. Kamvar, and C. Manning. From
Instance-level Constraints to Space-level Constraints: Making the Most of
Prior Knowledge in Data Clustering. Proceedings of the Nineteenth
International Conference on Machine Learning, 2002.
S. Kamvar, D. Klein, and C. Manning. Interpreting
and Extending Classical Agglomerative Clustering Algorithms using a
Model-Based Approach. Proceedings of the Nineteenth International
Conference on Machine Learning, 2002.
Glen Jeh and Jennifer Widom. SimRank:
A Measure of Structural-Context Similarity. Technical Report, Computer
Science Department, Stanford University, 2001.
Glen Jeh and Jennifer Widom. Scaling
Personalized Web Search. Technical Report, Computer Science Department,
Stanford University, 2002.
Sites relevant to the project include: DB
Group home page, Infolab home page,
NLP Group home page, Digital
Libraries project home page.
Last modified: Jan. 22 2003