Deriving Knowledge from Figures for Digital Libraries
Xiaonan Lu, James Z. Wang, Prasenjit Mitra, and C. Lee Giles
The Pennsylvania State University
Figures in digital documents contain important information.
Current digital libraries do not summarize and index
information available within figures for document retrieval.
We present our system on automatic categorization of
figures and extraction of data from 2-D plots. A machinelearning
based method is used to categorize figures into
a set of predefined types based on image features. An
automated algorithm is designed to extract data values from
solid line curves in 2-D plots. The semantic type of figures
and extracted data values from 2-D plots can be integrated
with textual information within documents to provide more
effective document retrieval services for digital library users.
Experimental evaluation has demonstrated that our system
can produce results suitable for real-world use.
Full color PDF file (230KB)
Xiaonan Lu, James Z. Wang, Prasenjit Mitra, and C. Lee Giles,
``Deriving Knowledge from Figures for Digital Libraries,''
Proceedings of the International World Wide Web Conference,
pp. 1229-1230, Banff, Alberta, Canada, May 2007.
Copyright 2007 Permission to make digital or hard copies of all or
part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or
commercial advantage and that copies bear this notice and the full
citation on the first page. To copy otherwise, to republish, to post
on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
March 12, 2007.
© 2007, James Z. Wang