Abstract: Security when Collaborating
Panel presentation on “Trust and Security in
Biological Databases”; Gio Wiederhold, Ph.D, Stanford University, CA
Traditional security mechanisms have focused on
access control, assuming that we can distinguish the good and the bad guys, and can
label any data collection as being accessible to the good guys. If those assumptions hold the technology is
conceptually
simple, and only made hard by technical faults. However, there are many
practical situations where such sharp distinctions cannot be made, so that the
technologies developed to solve access control become inadequate. In medicine,
but also in many commercial data collections we find unstructured data. Such
data are collected and stored without the submitter being fully aware
of their future use and hence unable to consider all future access needs. A complementary technology to augment access
control is result filtering: namely inspecting the contents of documents
before they leave the boundary of the protected system.
I will briefly cite the issue in two settings, one simple and one more
complex. Military documents have long
been classified into mandatory and discretionary classifications. Legitimate
accessors are identified with respect to those categories. But
when a new situation arises, the old labels are inadequate. When we had to
share information with the Russians in Kosovo, no adequate labeling existed. Relabeling
all stored documents was clearly impractical. A filter can be written to check
the text for limited, locally relevant contents, and make those available. Any
document containing unrecognized noun-phrases would be withheld, or could be handed
over to a security officer for manual processing.
More complex situations occurs when we have statistical data, as
census, or, as in bioinformatics, phenotypic and genomic data. We
want to prevent the release of statistical summaries for cells that have fewer
than 10 instances say, to reduce the likelihood of inference back to an individual. If we
use access control, we have to precompute the minima for columns and rows
and aggregate their categorizations for access to prevent release. However,
the distributions in those cells is very uneven. So if
we check the actual contents at the time of release, we can allow much smaller
categories to be used for access and only omit or aggregate cells that are too
small.
Checking results being released can also provide a barrier for credit
card theft and the like. If a person who masquerades as a customer locates a trapdoor and
removes 10,000 credit cards instead of an MP3 tune, that can easily be recognized, since
those data have very different signatures.
In summary, many of our accessors are collaborators or customers,
although we know little about them. We want to give them the best
possible service, and still protect our property or the privacy that
individuals are trusting us to keep. Focusing only on
access control, and then not checking what is released is an inadequate, even
a naive approach for systems involving collaboration.
Research leading to these concepts and
supporting technologies was supported by NSF under the HPCC and DL2 programs