Detecting Digital Copyright Violations on the Internet


Narayanan Shivakumar
Department of Computer Science,
Stanford University.
shiva@db.stanford.edu

Advisor: Hector Garcia-Molina
Filed: August 1999.


Abstract: Cyber-pirates are offering music CDs, video clips and books on the web in digital format to a large audience at virtually no cost. Content publishers such as Disney and Sony Records therefore expect to lose several billions of dollars in copyright revenues over the next few years. To address this problem, we propose building a copy detection system (CDS), where content publishers will register their valuable digital content. The CDS then crawls the web, compares the web content to the registered content, and notifies the content owners of illegal copies. In this dissertation, we discuss how to build such a system so it is accurate, scalable (e.g., to hundreds of gigabytes of data, or millions of web pages) and resilient to ``attacks'' (e.g., copying a resampled audio clip) from cyber-pirates. We also discuss two prototype CDS systems we have built as ``proofs of concept:'' (1) SCAM (Stanford Copy Analysis Mechanism) for text documents, and (2) DECEIVE (Detecting Copies of Internet Video) for video sequences.


Thesis in PostScript format (~5 MBs)
Thesis in PDF format (~1.3 MBs)
Thesis in gzipped PostScript (~660 K)
Thesis in gzipped PDF format (~1.0 MBs)