| CS345 Final Exam
|
Directions
-
You have from 9AM Friday June 2 through 5PM June 5, 2000 to answer this
exam.
-
Return the exam to 411 Gates.
-
You may use any written materials, but you must cite
sources when appropriate.
Since we expect most of these sources will be the course readings, these
readings have been numbered on the Class Web Page
so you can refer to them succinctly.
For example, the Fastmap paper is [5], and Kleinberg's paper is [9a].
-
Do not forget to sign the Honor Code.
Of course, the work you hand in must be your own; no collaboration or
borrowing from others is permitted.
-
You are limited to 5 pages at 10 pt.
Alternatively, you may use 6 pages at 11pt. or 7 pages at 12pt.
The latter options will be appreciated by the instructor, but do not
represent an increase in the alloted verbiage, since the number of pages
needed for a given document rises approximately with the square
of the point size.
The Exam
Nile.com is a large, on-line bookseller.
They have 10 million customers and sell one million different books.
Their sales history may be represented by
a matrix with the entry in row r
and column c set to 0 if customer r has not bought book
c, and set to the date of purchase if customer r did buy
book c.
You may choose how this matrix is stored, as long as the
representation does not conceal the solution to the problem in an
unrealistic way.
Nile wishes to recommend to each customer books that they might like,
based on the information available.
If, for example, they determine that the customer is a
``database person,'' because they have bought several ``database books''
in the past, then they might recommend another database book.
On the other hand, if they already bought one book on SQL2 then they
might be less likely to buy another book on the same topic, but instead
might buy a book on transaction processing, or Oracle-8i.
Perhaps, the historical behavior of other ``similar''
customers would
be helpful.
On the other hand, the notion of ``similar customer'' is
slippery.
Two ``database people'' may have bought zero books in common, for
example.
Worse, people usually aren't of one kind.
Some ``database people'' will also buy books about travel, some about
music, others buy romance novels, so customers can fall into several
categories.
Moreover, the notion of ``database book'' is slippery as well.
Perhaps it would be more useful to classify books as ``Oracle,''
``Sybase,'' ``SQL,'' ``OQL,'' and similar, possibly overlapping,
categories.
Is a book on the history of computing a book about history, or
computing, or both?
Your Question:
Consider each of the papers we have read in this course.
For each, tell whether you think the content would be useful in solving
Nile.com's problem described above, and if so, how?
It is OK to say ``no help'' for some of the papers, but you should
make positive use of ideas from at least a few of these papers.
If you think a paper will not address this problem, give a sentence stating
why.
If you describe an algorithm, emphasize the important steps rather than
the details.
Discuss the running time of your algorithm(s).
You should use some secondary-storage model of complexity, such as
number of passes through the data or number of disk I/O's.
Also, if there are significant main-memory portions of your algorithm,
make sure that a reasonable machine will have enough memory, and that
the main-memory algorithm you propose can be performed on a realistic
machine in a reasonable amount of time.
Also state any assumptions you make.
You will have to make some
assumptions, since the problem is not specified to the ultimate level of
detail.
However, your assumptions should be consistent with reality, e.g., don't
assume that there are 10 books that almost all customers buy, or that
the typical customer buys 10,000 books.
Some Words of Advice
I'm concerned that people may try to read something into this exam question
that
is not there.
Here is what I would like to see:
-
First, there isn't only one right answer.
-
I would like people to show that they understand some, or preferably
all, of the papers and topics covered by discussing their use in a
situation that is neither the same as what was covered in class, nor
completely different.
-
I am aware that the page limit requires you to choose your topics
carefully, and I also expect that you will have to ration what you say
about each one.
While you might like to go into extreme detail about, say, an algorithm,
you will not have the space to do so.
Rather, write down the
most important or most tricky points, to show that you have thought the matter
through, relying on me to accept that you could also do the more
standard things, or things that are like something in a paper you cite.
-
I will be answering email during the weekend, and will respond to
questions of scope or form (not content, of course).
My email is
ullman @ db.stanford.edu.
I also suggest that you check email over the weekend, in case there are
clarifications that I need to broadcast to the entire class.