Web Search, Digital Libraries, and Metadata

Steve Lawrence*
NEC Research Institute


The web and search engines represent a significant improvement for information access, however there is much room for improvement to
existing techniques. Our results show that search engines only index a fraction of all publicly indexable web pages, do not index sites
equally, and may not index new pages for months. We also analyze metadata and the volume and distribution of information on the web. We
discuss CiteSeer, which is the largest free full-text index of scientific literature in the world. CiteSeer automatically extracts metadata from research articles, and provides a number of novel features including autonomous citation indexing and the extraction of citation context.


