More sample project concepts
identifying near-duplicate docs (throwing out ephemeral text)
spam detection
- query log spamming
- falsified hub/authenticate web structures
query log mining
- adjusting doc relevance scores
- identifying common problem queries (no reusults, too many clicks after search)