Solution for CS345 Final

The best ideas I found were:

  1. Use low-support, high-correlation mining for pairs of books that have similar customers. I suspect that a really low similarity would identify a relatively small number of book-pairs that are related. For instance, if two books are each bought by 100 people, and two of them buy both books, that is probably unusual and significant. It was good to see that many people observed that with 1,000,000 columns, a min-hash scheme might require more main memory than was available. An LSH scheme, or partition of the columns into some small number of groups, were the best solutions suggested.

  2. Use a non-Euclidean clustering algorithm to cluster books. The distance function has to be based on the similarity, with high similarity = low distance.

If you said this much (and nothing else), you would, under my grading system, have a score that is so high there is no grade on the grade-sheet high enough. Unfortunately, most people either said these things as one of many possibilities or said some things that really won't work.

Common Errors

I have the feeling that there is a books-authors type of mining that could be useful, but I haven't seen anyone make a good case. Intuitively, you seed with some obvious books in a class; find customers who bought some of those books, find other books heavily bought by those customers, and so on.

A Word About Diction

I didn't try to correct every misspelling, grammar error, or diction error. However, I did circle ``nonreferential 'this'es'' whenever I noticed one, and several of you had quite a few. The problem is that ``this'' is a pronoun (or an adjective). As a pronoun, it has to stand for the previous noun. Many people use ``this'' or ``that'' to stand for ``one of the points I just made --- you know what I mean.'' The trouble is that your reader doesn't know what you mean in all cases. Often, a nonreferential ``this'' hides the fact that you are not sure yourself why something is true. Thus, while I didn't deduct anything for this type of prose, I strongly recommend that you get nonreferential ``this''es out of your writing in the future. You'll be amazed at how much doing so sharpens your writing and forces you to be specific when dealing with tricky points!