CS345 --- Final Exam

CS345 Final Exam

Directions

You have from 9AM Friday June 2 through 5PM June 5, 2000 to answer this exam.
Return the exam to 411 Gates.
You may use any written materials, but you must cite sources when appropriate. Since we expect most of these sources will be the course readings, these readings have been numbered on the Class Web Page so you can refer to them succinctly. For example, the Fastmap paper is [5], and Kleinberg's paper is [9a].
Do not forget to sign the Honor Code. Of course, the work you hand in must be your own; no collaboration or borrowing from others is permitted.
You are limited to 5 pages at 10 pt. Alternatively, you may use 6 pages at 11pt. or 7 pages at 12pt. The latter options will be appreciated by the instructor, but do not represent an increase in the alloted verbiage, since the number of pages needed for a given document rises approximately with the square of the point size.

The Exam

Nile.com is a large, on-line bookseller. They have 10 million customers and sell one million different books. Their sales history may be represented by a matrix with the entry in row r and column c set to 0 if customer r has not bought book c, and set to the date of purchase if customer r did buy book c. You may choose how this matrix is stored, as long as the representation does not conceal the solution to the problem in an unrealistic way.

Nile wishes to recommend to each customer books that they might like, based on the information available. If, for example, they determine that the customer is a ``database person,'' because they have bought several ``database books'' in the past, then they might recommend another database book. On the other hand, if they already bought one book on SQL2 then they might be less likely to buy another book on the same topic, but instead might buy a book on transaction processing, or Oracle-8i. Perhaps, the historical behavior of other ``similar'' customers would be helpful.

On the other hand, the notion of ``similar customer'' is slippery. Two ``database people'' may have bought zero books in common, for example. Worse, people usually aren't of one kind. Some ``database people'' will also buy books about travel, some about music, others buy romance novels, so customers can fall into several categories. Moreover, the notion of ``database book'' is slippery as well. Perhaps it would be more useful to classify books as ``Oracle,'' ``Sybase,'' ``SQL,'' ``OQL,'' and similar, possibly overlapping, categories. Is a book on the history of computing a book about history, or computing, or both?

Your Question: Consider each of the papers we have read in this course. For each, tell whether you think the content would be useful in solving Nile.com's problem described above, and if so, how? It is OK to say ``no help'' for some of the papers, but you should make positive use of ideas from at least a few of these papers. If you think a paper will not address this problem, give a sentence stating why. If you describe an algorithm, emphasize the important steps rather than the details.

Discuss the running time of your algorithm(s). You should use some secondary-storage model of complexity, such as number of passes through the data or number of disk I/O's. Also, if there are significant main-memory portions of your algorithm, make sure that a reasonable machine will have enough memory, and that the main-memory algorithm you propose can be performed on a realistic machine in a reasonable amount of time.

Also state any assumptions you make. You will have to make some assumptions, since the problem is not specified to the ultimate level of detail. However, your assumptions should be consistent with reality, e.g., don't assume that there are 10 books that almost all customers buy, or that the typical customer buys 10,000 books.

Some Words of Advice

I'm concerned that people may try to read something into this exam question that is not there. Here is what I would like to see:

First, there isn't only one right answer.
I would like people to show that they understand some, or preferably all, of the papers and topics covered by discussing their use in a situation that is neither the same as what was covered in class, nor completely different.
I am aware that the page limit requires you to choose your topics carefully, and I also expect that you will have to ration what you say about each one. While you might like to go into extreme detail about, say, an algorithm, you will not have the space to do so. Rather, write down the most important or most tricky points, to show that you have thought the matter through, relying on me to accept that you could also do the more standard things, or things that are like something in a paper you cite.
I will be answering email during the weekend, and will respond to questions of scope or form (not content, of course). My email is ullman @ db.stanford.edu. I also suggest that you check email over the weekend, in case there are clarifications that I need to broadcast to the entire class.