The large majority of biological experiments do not have standardized templates. The results of these experiments are still predominantly disseminated in published texts accompanied by figures and tables for summary and convenience. While this format is useful for knowledge extraction by readers on a per-article basis, it does not allow for efficient integration of all data relevant to a particular topic, and it certainly is not amenable to computer-based data extraction for the purposes of further computations on these data.
To show the value of structured representations of data in dealing with these critical issues, we have built a prototype knowledge base (RiboWEB) of structural data pertaining to the small (30S) ribosomal subunit of E. coli. Diverse types of data taken principally from published journal articles are represented using a set of templates within this knowledge base, and these data are linked to each other with numerous and rich connections. Not only does this representation allow for easier and more convenient data retrieval by human users, but it facilitates automated data analysis by computer programs. We believe that formal representations of the data and models within scientific subdisciplines hold promise as a key method for delivering the next generation of scientific data resources and represent the way in which scientific data should be published in the future.