|CS345, Autumn 2006: Data Mining.|
The Final for 2006.
Instructors: Anand Rajaraman (anand @ kosmix dt com), Jeffrey D. Ullman (ullman @ gmail dt com).
TA: Jeff Klingner
Email Address for Questions: cs345a-aut0607-staff @ lists dt stanford dt edu (This is the best way to reach all three of us simultaneously)
Meeting: MW 3:15 - 4:30PM; Room: 200-030 (In the history corner, the part of the quad closest to Hoover tower.)
Office Hours: Instructors will be available after classes that they teach. Jeff Ullman is in 433 Gates and Anand in 413 Gates. Jeff Klingner's office hours: Tuesdays 10am-noon & Thursdays 3pm-5pm, Gates 396, or by appointment.
Prerequisites: CS145 or equivalent.
Materials: There is no text, but students will use the Gradiance automated homework system for which a nominal fee will be charged. Notes and/or slides will be posted on-line. You can see earlier versions of the notes and slides covering Data Mining. Not all these topics will be covered this year.
Requirements: There will be periodic homeworks (some on-line, using the Gradiance system), a final exam, and a project on web-mining, using the Stanford WebBase. The homework will count just enough to encourage you to do it, about 20%. The project and final will account for the bulk of the credit, in roughly equal proportions.
Newsgroup: There is a class newsgroup: su.class.cs345a on nntp.stanford.edu. You can use the newsgroup to share datasets, form study groups, or find project partners. The course staff will not read the newsgroup regularly, and we won't use it for any official announcements. To get in touch with us, use cs345a-aut0607-staff @ lists dt stanford dt edu.
|Date||Topic||PowerPoint Slides||PDF Document|
|9/25||Introduction to Web Mining||PPT|
|9/27||Association Rules 1||PPT|
|10/2||Association Rules 2||PPT|
|10/9||Topic-Specific Page Rank||PPT|
|10/11||HITS and Spam||PPT|
|10/16||Near-Neighbors and Minhashing||PPT|
|10/23||Clustering - Part 1||PPT|
|10/30||Clustering - Part 2||PPT|
|11/01||Structured Data Extraction||PPT|
|11/13||Online Algorithms, Search Advertising||PPT|
|11/15||Stream Mining 1||PPT|
|11/27||Stream Mining 2||PPT|
|11/27||Stream Mining 3||PPT|
|11/29||Stream Mining 4||PPT|
Solutions appear after the problem-set is due. However, you must submit at least once, so your most recent solution appears with the solutions embedded.
|Association Rules #1||Tuesday, Oct. 10 (11:59PM)|
|Association Rules #2||Wednesday, Oct. 11 (11:59PM)|
|Page Rank||Monday, Oct. 16 (11:59PM)|
|Minhashing, LSH||Wednesday, Oct. 30 (11:59PM)|
|HITS, TSPR, Spam||Monday Oct. 30 (11:59PM)|
|Project Proposal||Wednesday, Nov. 1 (11:59PM)|
|Distance Measures||Monday, Nov. 6 (11:59PM)|
|Recommendation Systems||Wednesday, Nov. 8 (11:59PM)|
|Clustering||Monday, Nov. 13 (11:59PM)|
|Stream Mining||Wednesday, Dec. 6 (11:59PM)|
Please submit your proposal in a reasonable format (text, html, pdf, etc.) to the staff email list (cs345a-aut0607-staff @ lists daht stanford dawt edu). One copy per group is fine.
This is intended mainly as a check to make sure you've got a workable idea and a plan for carrying it through, so don't spend a lot of time making it beautiful, but do think seriously about what you're planning to do. It is to your benefit to flesh out your ideas as much as possible in this proposal.
|12/4||3:15-4:00||Gred Linden||Guest Lecture: Amazon's Recommendation Engine|
|12/4||4:00-4:10||Abhita Chugh and Ravi Tiruvury||Detecting Web Spam with CombinedRank|
|12/4||4:10-4:20||Rahul Thathoo and Zahid Khan||Towards Implementing Better Movie Recommendation Systems|
|12/4||4:20-4:30||Brian Tran and Minho Kim||Topic Specific Recommendation|
|12/4||4:30-4:40||David Reiss||Identifying terms with similar meanings across corpora|
|12/4||4:40-4:50||NielFred Picciotto||Finding Interesting Videos Early via Trend-Setting Viewers|
|12/4||4:50-5:00||Sean Kandel||Web Data Extraction Using Tag Trees|
|12/4||5:00-5:10||Priyank Chodisetti||A shot at Netflix Challenge - Hybrid Recommendation System|
|12/6||3:15-3:25||Hayato Akatsuka||Weather Mining|
|12/6||3:25-3:35||Alex Giladi||Using LSH for motion estimation|
|12/6||3:35-3:45||Joseph Bonneau||Sports Peformance and Salary|
|12/6||3:45-3:55||Negin Nejati||Web Mining for Extracting Relations|
|12/6||3:55-4:05||Vincenzo Di Nicola and Jyotika Prasad||42: A Web Based Question Answering System|
|12/6||4:05-4:15||Manjunath Rajashekhar||Frequent Itemsets Mining in Distributed Wireless Sensor Networks|
|12/6||4:15-4:25||Hao Liu||Clustering Based News Event Detection and Tracking|
|12/6||4:25-4:35||Jack Cheng||Improvements on Netflix Recommendation System Using Data-mining Algorithms|
|12/6||4:35-4:45||Arpit Aggarwal and Omkar Mate||Recommendation System for Portfolio Management|
|12/6||4:45-4:55||Romain Colle||Near-duplicates detection: Comparison of the two algorithms seen in class|
|12/6||4:55-5:05||Alan Sheinberg and Greg Nelson||Netflix Challenge: Combined Collaborative Filtering|
|12/6||5:05-5:15||Fred Wulff||Course Helper: A Course Recommendation System|
|11/01||Extracting Structured Data from the Web||AR|
|11/06||Extracting Structured Data from the Web||AR|
|11/13||Advertising on the web||AR|
|12/13||Final Exam, 12:15pm - 3:15pm|