References for introductory lecture (9/26)
- Andrei Broder et al, "Graph Structure of the Web". WWW9 conference, 2000.
- Chris Anderson, "The Long Tail". Wired magazine, October 2004.
- Sergey Brin and Larry Page. The anatomy of a large scale hypertextual web search engine. WWW7, 1998.
- Lada A Adamic. "Zipf, Power-laws, and Pareto - a ranking tutorial."
- Lada A. Adamic and Bernardo A. Huberman. "Zipf's
law and the Internet." Glottometrics 3, 2002, 143-150.
References for Web Crawling lecture (10/05)
- Junghoo Cho, Hector Garcia-Molina, Lawrence Page "Efficient
Crawling Through URL Ordering." Computer Networks and ISDN
Systems, 30(1-7):161-172, 1998.
- Junghoo Cho, Hector Garcia-Molina
"Effective page refresh policies for Web crawlers."
ACM Transactions on Database Systems, 28(4): December 2003.
- Ka Cheung Sia, Junghoo Cho
"Efficient Monitoring Algorithm for Fast News Alert".
Technical report, UCLA, 2005.
- M. Najork and J. L. Wiener.
"Breadth-First Crawling Yields High-Quality
Pages." In Proceedings of the 10th International World Wide Web Conference,
pages 114--118, Hong Kong, May 2001
- Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig.
"Syntactic Clustering of the Web."
WWW6, 1997.
References for Topic-Specific Page Rank lecture (10/19)
- Taher Haveliwala.
Topic-Sensitive Page Rank. Proceedings of WWW11, 2002.
- Glen Jeh and Jennifer Widom. Scaling Personalized Web
Search. Proceedings of WWW12, 2003.
References for Web Spam lecture (10/24)
- Zoltán Gyöngyi, Hector Garcia-Molina.
Web Spam Taxonomy.
First International Workshop on Adversarial Information Retrieval on the
Web (at the 14th
International World Wide Web Conference), Chiba, Japan, 2005.
- Zoltán Gyöngyi, Hector Garcia-Molina and Jan Pedersen.
Combating Web Spam with TrustRank.
30th International Conference on Very Large Data Bases (VLDB),
Toronto, Canada, 2004.
References for Relation Extraction lecture (11/07)
- Sergey Brin. Extracting Patterns
and Relations from the
World Wide Web. WebDB Workshop at 6th International Conference on
Extending Database Technology, EDBT'98, 1998.
- Eugene Agichtein and Luis Gravano.
Snowball: Extracting Relations from Large Plain-Text Collections
. Proceedings of the Fifth ACM International Conference on Digital
Libraries, 2000.
- S. Dumais, M. Banko, E. Brill, J. Lin and A. Ng
(2002). P. Bennett, S. Dumais and E. Horvitz (2002).
Web question answering: Is more always better? In Proceedings of SIGIR'02, Aug 2002,
pp. 291-298.
References for Virtual Databases lecture (11/09)
- Nicholas Kushmerick, Daniel S. Weld, Robert Doorenbos.
Wrapper Induction for Information Extraction
.
Intl. Joint Conference on Artificial Intelligence (IJCAI), 1997.
- Anand Rajaraman, Jeffrey D. Ullman,
Querying Websites using Compact Skeletons.
Journal of Computer and System Sciences 66(4): 809-851 (2003).