CS345 Final Exam


The Exam

Nile.com is a large, on-line bookseller. They have 10 million customers and sell one million different books. Their sales history may be represented by a matrix with the entry in row r and column c set to 0 if customer r has not bought book c, and set to the date of purchase if customer r did buy book c. You may choose how this matrix is stored, as long as the representation does not conceal the solution to the problem in an unrealistic way.

Nile wishes to recommend to each customer books that they might like, based on the information available. If, for example, they determine that the customer is a ``database person,'' because they have bought several ``database books'' in the past, then they might recommend another database book. On the other hand, if they already bought one book on SQL2 then they might be less likely to buy another book on the same topic, but instead might buy a book on transaction processing, or Oracle-8i. Perhaps, the historical behavior of other ``similar'' customers would be helpful.

On the other hand, the notion of ``similar customer'' is slippery. Two ``database people'' may have bought zero books in common, for example. Worse, people usually aren't of one kind. Some ``database people'' will also buy books about travel, some about music, others buy romance novels, so customers can fall into several categories. Moreover, the notion of ``database book'' is slippery as well. Perhaps it would be more useful to classify books as ``Oracle,'' ``Sybase,'' ``SQL,'' ``OQL,'' and similar, possibly overlapping, categories. Is a book on the history of computing a book about history, or computing, or both?

Your Question: Consider each of the papers we have read in this course. For each, tell whether you think the content would be useful in solving Nile.com's problem described above, and if so, how? It is OK to say ``no help'' for some of the papers, but you should make positive use of ideas from at least a few of these papers. If you think a paper will not address this problem, give a sentence stating why. If you describe an algorithm, emphasize the important steps rather than the details.

Discuss the running time of your algorithm(s). You should use some secondary-storage model of complexity, such as number of passes through the data or number of disk I/O's. Also, if there are significant main-memory portions of your algorithm, make sure that a reasonable machine will have enough memory, and that the main-memory algorithm you propose can be performed on a realistic machine in a reasonable amount of time.

Also state any assumptions you make. You will have to make some assumptions, since the problem is not specified to the ultimate level of detail. However, your assumptions should be consistent with reality, e.g., don't assume that there are 10 books that almost all customers buy, or that the typical customer buys 10,000 books.

Some Words of Advice

I'm concerned that people may try to read something into this exam question that is not there. Here is what I would like to see: