Gio Wiederhold

Stanford University

March 2002, updated February 2003

This brief essay is intended to improve the understanding of the differences that exist in the practice of science in the biological sciences and the mathematical sciences. This issue is moving to the forefront as these sciences, and the scientists in them, increase their collaboration. We do not advocate that any scientists change their approaches, and certainly don't expect that scientists will change their habits. However recognizing that fundamental distinctions exist in the practice of science is essential to support an effective interaction.

In the discussion that follows I will use medicine as the example in the biological arena and computer science as the model for mathematical sciences. I do believe that the comparison has broader validity, but my experience is strongest in those specific fields, and in the interaction among them. My own dual role started in 1961, when I was responsible for supporting computing for scientists both at UC Berkeley and UC San Francisco, a medical school. It took a long time before I was able to express the issue.

The formal point being made is that in medicine the scientific model is implicitly linear, while in the computational sciences the complexity of a problem solution is made explicit, often through the big-Oh notation, and often frustratingly non-linear. The formal distinction transcends this algorithmic abstraction. At a higher level than this distinction are differences in the scientific paradigm being used, and those have created differences in value assessments, publication conventions, academic promotions, and in approaches to education.

In medical education, for instance, once the basic sciences have been covered, clinical competence is taught by examples. A few instances of each type of procedure, say a surgical intervention, are gone through, first by watching, then by active assistance, and finally by monitored execution. Once a resident, the young physician works more or less alone, and gains further experience by performing additional cases. The level of each type of case varies, but within bounds. Learning occurs in a linear progression. If the bounds are exceeded, the problem typically receives a fancier name and a specialist is called in.

Complex problems will arise due to interaction with other conditions. Here experience helps, but solutions are poorly formalized. For instance, systems built in medicine to aid in diagnosis will fail when multiple problems are presented, and the interaction of genotypic differences with disease onset and treatment is just starting to be recognized. Analytical models severely limit the number of independent variables, and by accumulating statistics over many randomized observation the effect of the remaining confounding factors can be ignored. A hypothesis is proven when the result is unlikely to be due to random factors.

Computing has a dual heritage. It is partially grounded in mathematics and partially grounded on engineering, and there is still tension among the two divisions. The mathematical approach relies on a formalization of a problem, defining a hypothesis in that space, and finding a definite proof. A single counterexample will invalidate the result. Moving closer to realism often requires larger models. Expanding a model to account for additional variables will require new proof methods, and can easily exceed existing proof capabilities.

The engineering objective is effective implementation. Engineers rarely deal with the unknown or absolutely infeasible. The space of solutions is typically quite large, being composed of several alternative designs are software architectures, each in a wide variety of module choices, sizes, and materials.. Searching the entire space for the best design is rarely affordable feasible and a much simpler model will be used for assessments. The designs will be pruned to retain the most likely alternatives. Many variables will be held fixed, as the choice of material. The performance of those designs will be evaluated over a limited range of requirements. The result will represent an optimum in the reduced space. Subsequently refinements will take place, but there is no assurance that a true, global optimum is reached.

In both divisions solutions are obtained by simplification of the real world. The models reduce the world to create a manageable abstraction. Scientific insight allows the modeler to manage the highest feasible extent of complexity. The art is to incorporate the significant variables into the model. Measurement of complexity (the bigOh notation) is a fundamental concept. Progress is measured quantitatively: the number of factors that can be extracted from a large data collection, the number of words that a speech recognition program can handle, the number of diagnoses that a series of clinical tests can deconvolute, the number of relevant items obtained by a web query, the cost, span, and capacity of a bridge.

A computational success will be seen in medical setting as a typical example, and failing to handle the next level case is disappointing to the medical establishment. A clinical success will not be seen as proof of a general approach by mathematically oriented scientists. Repeating experiments over different populations is essential in biology to gain confidence in statistical results and allow generalization, repeating a mathematical proof is only done when a prior result is suspicious.

More generally, expectations of what colleagues from the other domain can achieve, and how well their experience can substantiate an opinion will be unfulfilled when viewed within an inappropriate scientific paradigm. A biologist often sees the computer scientist only as a craftsman supplying programs of limited scope. The computer scientists sees the biologist as a provider of dirty and unreliable data.

Complaining to ones close colleagues about the apparent poor services and science practiced outside further reinforces xenophobic arrogance. When the other parties sense such arrogance the trust needed achieve scientific collaboration is no longer available.

When dealing with groups that use other fundamental paradigms, one must realize that more is required than just learning a new vocabulary. The analytic models will differ as well. Accepting that the scientific paradigm differs, and that there is a valid reason for the difference is a good start when trying to collaborate with researchers outside of ones own domain.

As a computer scientist I am tempted to generalize these notions to other domains, as interacting political groupings, religions, races, nation-states. Wearing my other hat, I will refrain from doing such an extrapolation without first being able to gain substantial experience in those interactions. Time will not allow me to get there.

============================