Detecting Digital Copyright Violations on the Internet
Department of Computer Science,
Advisor: Hector Garcia-Molina
Filed: August 1999.
Cyber-pirates are offering music CDs, video clips and books on the web
in digital format to a large audience at virtually no cost.
Content publishers such as Disney and Sony Records therefore
expect to lose several billions of dollars in copyright revenues
over the next few years. To address this problem, we propose
building a copy detection system (CDS), where content publishers will
register their valuable digital content. The CDS then crawls the web,
compares the web content to the registered content, and notifies the
content owners of illegal copies. In this dissertation, we discuss how to
build such a system so it is accurate, scalable (e.g., to hundreds of
gigabytes of data, or millions of web pages) and resilient to
``attacks'' (e.g., copying a resampled audio clip) from cyber-pirates. We also
discuss two prototype CDS systems we have built as ``proofs of concept:''
(1) SCAM (Stanford Copy Analysis Mechanism) for text documents,
and (2) DECEIVE (Detecting Copies of Internet Video) for video sequences.
Thesis in PostScript format (~5 MBs)
Thesis in PDF format (~1.3 MBs)
Thesis in gzipped PostScript (~660 K)
Thesis in gzipped PDF format (~1.0 MBs)