Pathway/Genome Databases and Software Tools

Peter D. Karp, Ph.D.
Bioinformatics Research Group
SRI International


One revolution sweeping molecular biology is the high-throughput generation of massive amounts of experimental data, such as by the human genome project. A second revolution is changing the substrate of biological information from the biological literature to structured databases. A pathway/genome database (DB) integrates information about the genome, proteins, and biochemical pathways of an organism. For example, the EcoCyc DB describes the full genome and metabolic-pathway complement of E. coli. EcoCyc is the first DB to describe the full biochemical network of an organism, and is used by thousands of scientists for tasks ranging from metabolic engineering of bacteria to analysis of other bacterial genomes.

The Pathway Tools software developed in conjunction with EcoCyc includes algorithms for interrogation, visualization, editing, and WWW publishing of pathway/genome DBs. The EcoCyc project has been a rich environment for computer-science and bioinformatics research. The talk will describe several computational contributions of the project including (a) the Ocelot object/relational database manager, (b) a reusable, schema-driven object-database editor, (c) a system for dynamically translating X-windows into HTML and GIF images, (d) hierarchical graph-layout algorithms for displaying the cellular biochemical network, and (e) algorithms for prediction and analysis of cellular biochemical networks.


Peter D. Karp received the Ph.D. degree in Computer Science from Stanford University in 1989. He was a postdoctoral fellow at theNational Center for Biotechnology Information at the National Institutes of Health. He was a vice president at DoubleTwist Inc, a bioinformatics company. He has spent seven years at the Artificial Intelligence Center at SRI International, where he now directs a bioinformatics research group. His research interests include knowledge representation and database systems, machine learning, scientific databases, and computing with biochemical networks.