Automatic Extraction of Data from 2-D Plots in Documents

Xiaonan Lu, James Z. Wang, Prasenjit Mitra and C. Lee Giles
The Pennsylvania State University

Two-dimensional (2-D) plots in digital documents contain important information. Often, the results of scientific experiments and performance of businesses are summarized using plots. Although 2-D plots are easily understood by human users, current search engines rarely utilize the information contained in the plots to enhance the results returned in response to queries posed by endusers. We propose an automated algorithm for extracting information from line curves in 2-D plots. The extracted information can be stored in a database and indexed to answer end-user queries and enhance search results. We have collected 2-D plot images from a variety of resources and tested our extraction algorithms. Experimental evaluation has demonstrated that our method can produce results suitable for real world use.

PDF file (388KB)

On-line info   

Citation: Xiaonan Lu, James Z. Wang, Prasenjit Mitra, and C. Lee Giles, ``Automatic Extraction of Data from 2-D Plots in Documents,'' Proceedings of the International Conference on Document Analysis and Recognition, pp. 188-192, Parana, Brazil, September 2007.

Copyright 2007 ICDAR. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Last Modified: August 15, 2007.
© 2007, James Z. Wang