HealthSecurity

Gio Wiederhold

Prepared for

Oct 2000

Future of Security and Privacy in Medical Information

Abstract

Today, issues of privacy and confidentiality in healthcare are dealt largely informally. Little legislation exists, and the awkwardness of accessing paper records makes violations of patients’ privacy sporadic. As healthcare institutions move towards a future where all information is kept in an Electronic Medical Record (EMR), the casual attitudes that are prevalent will be in conflict with the desires and expectations of the patients. Legislation has been passed to make the holders of medical data responsible for securely protecting the patients privacy. Specific implementation guidelines are still lacking. There is much institutional resistance to the adoption of rigorous rules, but we expect that in the near future reliable procedures will have to be implemented to comply both with legal guidelines and patient’s expectations. .

After introducing the issue more precisely we provide an overview over the concepts needed to understand the roles of technology of privacy and security and the people that must manage the technology. We then discuss the components of secure EMR systems and will point out where adequate technology exists and where future improvements are essential. We conclude with some advice to healthcare management facing the demands for security and privacy that the future will bring.

1. Introduction.

This chapter considers that in the near future most patient information will be stored in an Electronic Medical Record (EMR) [DickS:91]. We expect that required patient information will be rapidly and completely available to persons who should receive that information, and not be made available to anyone else. To achieve that simple objective many pieces of technology have to work correctly and reliably. We will identify most of these technological components because they are all interrelated, but not discuss many of them in depth because they are common to all computer-based information systems. We will focus on issues that are particular to the medical record domain. Unfortunately there are problems with medical records that are not handled adequately by the methods that are supplied in broad-purpose software [ClaytonEa:97]. Issues of security and privacy are not unique to medical information, but we will see that they become more complex in health care.

Protection of privacy depends greatly on having a secure system. Security first of all requires that persons accessing the system are properly identified, or authenticated. Once they are authenticated, they can be authorized to read or manipulate specific parts of the EMR. Healthcare records contain a wide variety of information, from relatively public demographic data to data that could be misused to embarrass a person or deny them employment, insurance, social, or residence opportunities. In between those categories is information of value to various organizations. Of primary concern is the delivery of information to a variety of caregivers, the physicians, consultants, nurses, pharmacists, etc. that depend on having complete and correct information to carry out their duties. External laboratories exchange crucial test information with the EMR. Hospital management has the duty to monitor the spread of infections within the hospital [EvansEa:96]. There are legitimate demands for certain information from public health organizations, insurance companies, and other third-party payors. There may be legal injunctions to obtain data, for instance in accident settlements. There is a need for medical treatment data and the effects of such treatments for medical research and pharmaceutical development. An important application is drug surveillance, checking if new drugs have side effects not found during the clinical trials that led to their approval. Last, but not least are the patient themselves, who have the right to know what is happening to them. However, sometimes the patient’s rights are abrogated by medical practitioners in the interest of the general well-being of a particular patient. Sometimes those rights are assigned to family members, who have assumed responsibility for the care of minors or senile patients [DonaldsonL:94].

Complexity.

We see that the EMR must serve a very wide variety of purposes. At the same time medical information cannot be as well structured as, say, banking or merchandising records. While the IRS has legitimate rights to survey some of our bank records, the variety of information seen in them is much simpler. The privacy of merchandising records is also a concern, and although the information in them rarely has the potential to be damaging, we expect that it will not be released with any personal identification. Unfortunately, personal identification is essential for many of the purposes in a medical record. For continuing treatment, for tracing the origin of an epidemic, for understanding a delayed effect of a drug treatment, etc., personal identification is crucial to information linkage. Even when, say for medical research, anonymous data is adequate, investigators, legitimate or not, still have means to identify individuals. For instance, in an anonymized research record the dates of visits to a clinic will be stored to understand the temporal course of a disease. A visit pattern is likely to be unique for any individual. Matching this pattern against the relatively public records of the clinic’s operation is certain to identify particular patients.

Reliability.

Security, and hence protection of privacy, can not be obtained unless the underlying computer systems are reliable. When failures occur, not only availability, but during repairs and downtimes the privacy of the records is easily compromised. If a large number of computer specialists have access to the computers for maintenance and repair security is easily compromised, since these wizards rarely go through the full authentication and authorization process demanded during normal operations. These issues are common to all computer systems, so will not discuss them further. We do observe that today, perhaps because of poor funding and high availability demands of computer systems in healthcare, that many systems are not run as carefully as they must be if security is to be assured. System reliability is not the focus of this chapter, but a reasonable level of reliability must be attained if security of information is to be achieved.

Whenever data are transmitted through outside of an institution there is chance that it may be misdirected or overheard. To protect again this type of loss, the contents of any transmission to a remote site should be encrypted, and decrypted upon receipt. Encryption technology for communication is routinely available, and has only modest effects on system performance and costs [Beth:95]. On the other hand, it is not effective to store medical records for the long-term in encrypted form. The variety of accessors is such that means for decryption will have to be provided at many points, adding little protection and a high costs to assure that access is provided when and where needed.

Another aspect of reliability pertains to the stored data themselves. Errors in data collection, data entry, filing, and data manipulation will occur even in very well managed systems. There are some differences between data kept on paper or handled in the EMR, but the final error rate is not drastically affected. Few EMR system designers have included convenient provisions to mark possible errors and eventually correct them, whereas marking questionable entries on paper is easy. A good EMR can distribute any corrections made automatically to all destination that have received erroneous data. Since there is less transcription in an EMR fewer eyes have a chance of finding errors, so that errors are less likely to be caught in an EMR. Computer systems can automatically identify simple inconsistencies in patients’ histories, laboratory results, and the like. Historically, physicians are well aware of the limitations of data, and will rarely commit to procedures based on a single indication. Now, and even more so in the future, economic pressures are reducing the redundancy of laboratory testing and status recording that provided a safety margin in earlier systems.

Security and privacy is indirectly affected by the presence of errors in data records. Reporting misfiled data about a patient to an external destination can be embarrassing and even costly. Data as well as processing errors will be seen as failures to properly protect data.

2. System concepts.

We will now summarize the system concepts that underlie the protection of healthcare information. Basic to any approach is the need to define what information is to be protected, authenticate the people that have access to the records, and manage their authorization with respect to the data. In a subsequent section we will introduce the technical means for achieving the security objectives, but first we must first address how the system must deal with people. Figure 1 illustrates the relationship among relevant concepts.

Figure 1: Components in Security and Protection of Privacy

Authentication.

Authentication of individual requires that some personal identification be submitted. Today, the entering of passwords into a computer is most common way. There is tension between having secure passwords, that are lengthy and uncommon, versus passwords that are easily remembered. Uncommon passwords are often written down, perhaps kept in the drawer of the terminal desk. Common passwords, as names of family members, pets, or birthdays are easily guessed. Even if only one user of a computer system chooses a poor password, the contents of an entire system may be compromised [CastanoFMS:95]..

In the near future identification cards will gain acceptance. They are combined with simple identification numbers that require little recall but still protect access in case of loss. Identification cards are less likely to be misplaced or left near the computer terminal if they are also needed to gain access to the buildings, the parking lots, etc. However, we will also have to deal with remote accessors where issuing individual cards is not feasible.

More stringent means of authentication employ biometric technologies that depend on unalterable physical characteristics of an individual. Methods that are being proposed to control authentication include automated checking of voice prints, fingerprints, facial features, retinal patterns, or hand dimensions [HolmesWM:91]. Devices for these methods are becoming routinely available, although they will not be found at every site where access to medical information is needed. For acceptance in critical settings these devices must demonstrate a very high reliability in practical situations, for instance the voice analysis must not deny access when practitioners express stress in their speech. Since identification is only one link in having secure systems a disproportionate investment in high-tech authentication is not warranted.

Domain of protection.

It is crucial to properly define the boundary of protection. In networked computing, as is common now, the boundary of protection is not simply the physical perimeter of a computer system, it extends to all the computer systems that share a common protection system. Such a virtual perimeter is best defined by a firewall [Cheswick:94], software that is intended to prevent both inappropriate access and inappropriate release of information. It is at the firewall that authentication is validated. A simple firewall may wrap one specific record system [Venema:92], several interoperating systems, or all computers within a healthcare enterprise. Sites outside of the perimeter may be accessed via the Internet, increasing the need for firewalls [ChapmanZ:95]. The complexity of providing protection depends on the scope of the system.

There may be multiple domains, each protected by their own firewalls, in a major institution. For instance, financial information may be segregated from patient care data. Some applications must span the domains, for instance, to justify billings to a third party information from the medical record is required, but the release of such information should be mediated, since the insurance company has no right to obtain information not related to the current case.

Within the health care system are many types of data, but central to our concern is the medical portion of the EMR. This portion is the most problematical, since much of its is relatively unstructured text. The text will contains both highly private information as well as information that must be made available for billing and external reporting. It can refer to multiple diseases, although the rules for release of information may differ among diseases. For instance information about HIV infections must be dealt with more carefully than cardiac problems. Pregnancies, diabetes, trauma, etc. all have differing sensitivities to release of data. Keeping the information in the medical record disjoint is not practical. For proper healthcare the total picture is essential, so that a rigorous partitioning is inappropriate. We do want that a nurse who takes care of a patient can be aware of any infections that can be transmitted, even if the current task is to deal with another problem.

Authorization.

Within a domain an authenticated accessor will have certain rights. Rights may pertain to certain files, or certain records. For instance, nurses on a ward should have access to all medical information for patients in the ward, while the physicians will need access to the patients under their care in various localities. The rights to append information, as entering orders, is more restricted. Authorization to actually change stored information is rare, since inhibits the audit trail for decisions that have been made.

For convenience, a specific type of authorization may be assigned to groups having specific roles, say the billing clerks in an institution, so that the number of entries in an authorization table remains manageable. Then there must be mapping table from individuals to such a group. It is inappropriate to assign an identification to multiple individuals. If that is done, and one member departs, new identification cards or passwords have to be given to all. The attendant costs and risky delays are worse than the cost of authenticating every individual.

Where access is performed by a remote organization, say a clinic, the issuing of EMR identifications can be delegated to that site. Again, individuals should be properly identified, even when they share an identical authorization. Such a policy encourages responsible protection of information and is essential to provide an audit trail. Remote access does have additional risks. To mitigate them, additional constraints may be imposed. For instance, insurance companies may be restricted to have access to otherwise authorized information only during their working hours, to prevent unsupervised access. Emergency overrides will not be needed.

Authorizations follow professional conventions. Physician and nurses that are bound by a code of ethics will receive broader rights than clerical personnel [AMA:94]. Staff working within an institution and receiving guidance will have fewer restrictions imposed on them than external staff [CPRI:96]. Where data access is not urgent, say for medical research, delays due to a more careful validation are acceptable. Data for research may also be transformed to reduce the risk of inadvertent disclosure [Sweeney:96].

The information defining assigned authorizations should be kept so that it can be easily inspected and updated when needed. That information also must be represented in a way that computer programs which enforce access can interpret the rights and assign them to authenticated individuals or roles as needed [GriffithsW:76]. The table containing the rights assigned to individuals and groups with respect to the types of data represent a major part of the institutional policy for security and protection.

Logging.

Authorized transactions within an EMR can be easily be recorded or logged. The storage capacity of computer systems is such that transaction logs can be quite comprehensive, recording who accessed what, when, and how. When data leave the institution, the actual contents should also be recorded. For periodic audit tools can be sued to spot atypical activities, and in case of problems definite conclusions can be drawn.

Here computer systems can perform much better than humans, who have fallible and often opinionated recall. To encourage that all data access are mediated by the security services, any transaction must be allowable when essential, including exceptional requests. If systems do not have the capability to allow exceptions, users will use improper means, such as copying files and removing them physically, when a legitimate need exists. The log and resulting audit trail then becomes incomplete. If problems arise, legitimate versus wrong use of improper methods are hard to distinguish.

Management of protection.

The decisions that define what types information to protect from what classes of individuals, and to what extent to invest in protection must be made a high institutional level. It s the management of an institution who ultimately have responsibility when access fails or when a patient’s expectation of privacy is violated [Regan:95]. Once the policies are set, their execution is delegated to specialists. In our description we will assume that the execution of protection policies is delegated to an institutional security officer. Such a person maintains the communication between management and computer and communication technologists who manage the actual software. The translation of policies to enforceable rules is always problematic. Not all desirable policies can be fully implemented. For instance, automating the policy that in an emergency case all data must be available, requires that the computer can unambiguously recognize an emergency. The policy may be implemented to make all information available recognize an emergency. The policy may be partially implemented by making all information available to emergency room personnel; but not all emergencies occur in the emergency room.

Having an individual on duty who is authorized to override restriction is wise. Such overrides can also be logged, so that a complete audit trail is maintained. The security officer can establish such rules in order to best implement the institutional policies. Today, the responsibility for implementation is often assigned to technical personnel as a secondary responsibility. When, for instance, the database manager is responsible for security, very liberal rules are likely to be established, since the primary function of this person is to make data available, not to protect them from inappropriate access. Similar concerns arise when a networking manager is assigned the responsibility for security and privacy, since for that person the primary objective is to keep the system accessible, not to protect data from inappropriate access.

3. Technologies.

In order to provide security a number of technologies are in common use. We listed various means for authentication above, but also have to worry that transmissions are save from intruders, that authorizations are obeyed, and that only appropriate information is released. We assume now that management policies are in place, and that that operational responsibilities have been assigned to a security officer.

Cryptography.

Transmission of information, including the passwords or identifications needed for authentication, can be protected through encryption. Encryption causes a message to be transformed according to an encryption key. The encryption key can direct shuffles, boolean transforms, reversible multiplications, and the like. Cryptography can provide an arbitrarily high level of protection by lengthening the key. The difficulty of breaking encrypted information increases proportionally to the power of the size of the encryption key. Software using <48?60>-bit keys has been in common use for a long-time [KonheimEa:80]. With current high-performance computers data encrypted with such a key are decodable within a few days. Still, the information to be gained by breaking into an EMR rarely warrants even that effort. Cryptographic procedures with much longer keys are now becoming available. Existing and developing capabilities seem to be adequate for healthcare.

Managing the keys is still a problem. The key used for encryption has to made available to the destination, so that decryption can be performed. Loss of the key makes all of the information inaccessible, and a stealing a copy of the key makes encryption meaningless. Key losses can be dealt with be depositing copies of the keys with a responsible part, an escrow agent. Law enforcement agencies have been favoring schemes where keys would always be deposited with an escrow agent, so that encrypted files could be decoded when a legal search warrant is issued. It appears unlikely that they will get their wish, since such restrictions can easily be ignored by criminals and people suspicious of the government.

Public-key encryption use two keys to overcome the problem of key management. Data to be transmitted are encrypted with two keys, one supplied by the sender and one by the receiver. A private version of the key is retained locally and derived keys are made publicly available. Encryption uses the local private and the remote public keys. Decryption requires the remote public and the local private keys. Public-key encryption is effective for modest data volumes, for sharing keys used to encrypt larger quantities of data, and to authenticate remote accessors [Diffie:88].

Although cryptography is an essential tool in protecting information from intruders, it only provides protection for well-defined tasks, and cannot distinguish among the many types of accessors that need to get to an EMR. All legitimate accessors to a record would need the same encryption key, and could not be distinguished. All others are viewed as potential enemies.

Firewalls.

Firewall software is now widely available, and is effective in defining the perimeter of an enterprise. They analyze the headers of incoming information, and sometimes outgoing, information packets and can limit access to sites that have known Internet IP addresses. It has been hard to protect computer systems from intruders who masquerade themselves as coming from legitimate Internet sites.

Many products can also validate submitted authentication information. Mobile accessors, say physicians on travel, typically do not have a fixed IP addresses, and for those individual authentication is essential. Since these identifications are submitted over public pathways, it is important that the transmissions are protected, so that potential intruders cannot copy legitimate name-and-password combinations.

Firewalls do not check the specific authorizations or contents of requests, submitted, or retrieved. For those aspects internal software, perhaps database systems must be responsible. If a legitimate user, either inadvertently or through subterfuge obtains inappropriate information, the filtering provided by a firewall is of no help.

Partitioning of the medical record.

The authorization table relates accessors, be they individuals or groups, to categories of the stored data. Implicit in this approach is that the data to be presnted or retrieved are partitioned into disjoint cells, so that for every authorization types cells with the appropriate rights can be identified. The process of assigning categories to information involves every person who creates, enters, or maintains information. When there are few cells, originators of data can understand what is at stake, and can perform the categorization function adequately, although errors in filing will still occur. When there are many cells, the categorization task becomes onerous and error prone. When new applications are created, surveillance for more diseases is needed, or new collaborators must share the existing information system, the categorization task becomes impossible.

4. Problems to be addressed in the near future.

We have seen that we deal in the medical domain with many types of collaborators, all sharing access to information in the EMR. These collaborators are important in our complex enterprise, and cannot be viewed as enemies. The medical record cannot be partitioned into sections that are distinct for each group of authorized users. Such sections will overlap, and the number of possible combinations will be unmanageable [LuniewskiEa:93].

Today, security provisions for computing focus on controlling access.

Relying on access control makes the assumption that five conditions are fulfilled

1. Authentication of all accessors

2. Perimeter control by use of a firewall or its equivalent

3. Authorizations that are complete and well-maintained

4. Secure transmission wherever physical access is not controlled

5. Partitioning of the information to match the authorization pattern

Unfortunately, in health care the last condition, namely perfect partitioning of the information into cells for disjoint access, is not realistic. We have many accessors whose needs overlap. We cannot expect that medical staff can foresee all the uses that medical information will serve, so that partitioning at the time of data collection is impossible. Delays to partition data later are not acceptable, since patient care demands that the record be accessible in a comprehensive form and up-to-date [Rindfleisch:97]. Furthermore, performing data partitioning to obtain security would greatly increase the cost of healthcare.

Changing patterns of outsourcing of services imposes exacerbates the problem. Reorganizing healthcare databases to deal with developing needs for external access is costly and disruptive, since it will affect existing users and their applications. The problem has been recognized, but not yet addressed in industry; for instance, security concerns were the cited as the prime reason for lack of progress in establishing virtual enterprises [HardwickS:96].

5. A Complementary Technology

The solution we provide to this dilemma is result checking [WiederholdBSQ:96]. In addition to the conventional tasks of access control the results of any information requests are filtered before releasing them to the requestor. We also check a large number of parameters about the release. This task mimics the manual function of a security officer when checking the briefcases of collaborating participants leaving a secure meeting, on exiting the secure facility. Note that checking of result contents is not performed in standard security processing. Multi-level secure systems may check for unwanted inferences when results are composed from data at distinct levels, but rely on level designations and record keys. Note that result checking need not depend on the sources of the result, so that it remains robust with respect to information categorization, software errors, and misfiling of data.

2. Filtering System Architecture

We incorporate result checking in a security mediator workstation, to be managed by a security officer. The security mediator system interposes security checking between external accessors and the data resources to be protected, as shown in Fig.1. It carries out functions of authentication and access control, to the extent that such services are not, or not reliably, provided by network and database services. Physically a security mediator is designed to operate on a distinct workstation, owned and operated by the enterprise security officer (S.O.). It is positioned as a pass gate within the enterprise firewall, if there is such a firewall. In our initial commercial installation the security mediator also provided traditional firewall functions, by limiting the IP addresses of requestors [WiederholdBD:98].

Fig.1. Functions provided by a TIHI/SAW Security Mediator

The mediator system and the source databases are expected to reside on different machines. Thus, since all queries that arrive from the external world, and their results, are processed by the security mediator, the databases behind a firewall need not be secure unless there are further internal requirements. When combined with an integrating mediator, a security mediator can also serve multiple data resources behind a firewall [Ullman:96]. Combining the results of a query requiring multiple sources prior to result checking improves the scope of result validation.

The supporting database systems can still implement their view-based protection facilities [GriffithsW:76]. These need not be fully trusted, but their mechanisms add efficiency.

Operation

Within the workstation is a rule-base system which investigates queries coming in and results to be transmitted to the external world. Any request and any result which cannot be vetted by the rule system is displayed to the security officer, for manual handling. The security officer decides to approve, edit, or reject the information. An associated logging subsystem provides an audit trail for all information that enters or leaves the domain. The log provides input to the security officer to aid in evolving the rule set, and increasing the effectiveness of the system.

The software of our security mediator is composed of modules that perform the following tasks

1. Optionally (if there is no firewall): Authentication of the requestor

2. Determination of authorization type (clique) for the requestor

3. Processing of a request for information (pre-processing) using the policy rules

4. If the request is dubious: interaction with the security officer

5. Communication to internal databases (submission of certified request)

6. Communication from internal databases (retrieval of unfiltered results)

7. Processing of results (post-processing ) using the policy rules

8. If the result is dubious: interaction with the security officer

9. Writing query, origin, actions, and results into a log file

10. Transmission of vetted information to the requestor

Item 7, the post-processing of the results obtained from the databases, possibly integrated, is the critical additional function. Such processing is potentially quite costly, since it has to deal thoroughly with a wide variety of data. Applying such filters selectively, specifically for he problems raised in collaborations, as well as the capabilities of modern computers and text-processing algorithms, makes use of the technology feasible.

<<Historical information is important for disease management, but not for many billing tasks. It is obviously impossible to split the record into access categories that match every dimension of access. Even if that would be possible, the cost and risks to the internal operations in a hospital or clinic would be prohibitive. >>

A rule-based system is used in TIHI to control the filtering, allowing the security policies to be set so that a reasonable balance of cost to benefit is achieved. It will be described in the next section.

Having rules, however is optional. Without rules the mediator system will operate in fully paranoid mode. Each query and each result will be submitted to the security officer. The security officer will view the contents on-line, and approved, edit, or reject the material. Adding rules enables automation. The extent of automation depends the coverage of the rule-set. A reasonable goal is the automatic processing of say, 90% of queries and 95% responses.

Unusual requests, perhaps issued because of a new coalition, assigned to a new clique, will initially not have applicable rules, but can be immediately processed by the security officer. In time, simple rules can be entered to reduce the load on the officer.

Traditional systems, based on access control to precisely defined cells, require a long time to before the data are set up, and when the effort is great, may never be automated. In many situation we are aware of, security mechanisms are ignored when requests for information are deemed to be important, but cannot be served by existing methods. Keeping the security officer in control allows any needed bypassing to be handled formally. This capability recognizes that in a dynamic, interactive world there will always be cases that are not foreseen or situations the rules are too stringent. Keeping the management of exceptions within the system greatly reduces confusion, errors, and liabilities.

Even when operating automatically, the security mediator remains under the control of the enterprise since the rules are modifiable by the security officer at all times. In addition, logs are accessible to the officer, who can keep track of the transactions. If some rules are found to be to liberal, policy can be tightened. If rules are too stringent, as evidenced by an excessive load on the security officer, they can be relaxed or elaborated.

3. The Rule System

The rules system is composed of the rules themselves, an interpreter for the rules, and primitives which are invoked by the rules. The rules embody the security policy of the enterprise. They are hence not preset into the software of the security mediator.

In order to automate the process of controlling access and ensuring the security of information, the security officer enters rules into the system. These rules are trigger analyses of requests, their results, and a number of associated parameters. The interpreting software uses these rules to determine the validity of every request and make the decisions pertaining to the disposition of the results. Auxiliary functions help the security officer enter appropriate rules and update them as the security needs of the organization change.

The rules are simple, short and comprehensive. They are stored in a database local to the security mediator system with all edit rights restricted to the security officer. Some rules may overlap, in which case the most restrictive rule automatically applies. The rules may pertain to requestors, cliques of requestors having certain roles, sessions, databases tables or any combinations of these.

Rules are selected based on the authorization clique determined for the requestor. All the applicable rules will be checked for every request issued by the requestor in every session. All rules will be enforced for every requestor and the request will be forwarded to the source databases only if it passes all tests. Any request not fully vetted is posted immediately to the log and sent the security officer. The failure message is directed to the security officer and not to the requestor, so that the requestors in such cases will not see the failure and its cause. This prevents that the requestor could interpret failure patterns and make meaningful inferences, or rephrase the request to try to bypass the filter [KeefeTT:89].

The novel aspect of our approach is that security mediator checks outgoing results as well. This is crucial since, from the security-point-of-view, requests are inclusive, not exclusive selectors of content and may retrieve unexpected information. In helpful, user-friendly information systems getting more than asked for is considered beneficial, but from a security point-of-view being generous is risky. Thus, even when the request has been validated, the results are also subject to screening by a set of rules. As before, all rules are enforced for every requestor and the results are accessible only if they pass all tests. Again, if the results violate a rule, a failure message is logged and sent to the security officer but not to the requestor.

Primitives

The rules invoke executable primitive functions which operate on requests, data, the log, and other information sources. As new security functions and technologies appear, or if specialized needs arise, new primitives can be inserted in the security mediator for subsequent rule invocation. In fact, we do not expect to be the source of all primitives. We do hope that all primitives will be sufficiently simple that their correct function can be verified.

Primitives which have been used include:

· Assignment of a requestor to a clique

· Limit access for clique to certain database table segments or columns

· Limit request to statistical (average, median, ..) information

· Provide number of data instances (database rows) used in a statistical result

· Provide number of tables used (joins) for result for further checking

· Limit number of requests per session

· Limit number of sessions per period

· Limit requests by requestor per period

· Block requests from all but listed sites

· Block delivery of results to all but listed sites

· Block receipt of requests by local time at request site

· Block delivery of results by local time at delivery site

· Constrain request to data which is keyed to requestor name

· Constrain request to data which is keyed to request site name

· Filter all result terms through a clique-specific good-word dictionary

· Disallow results containing terms in a clique-specific bad-word dictionary

· Convert text by replacing identifies with non-identifying surrogates [Sweeney:96]

· Convert text by replacing objectionable terms with surrogates

· Randomize responses for legal protection [Leiss:82]

· Extract text out of x-ray images (for further filtering) [WangWL:98]

· Notify the security officer immediately of failure reports

· Place failure reports only in the log

Not all primitives will have a role in all applications.

Primitives can vary greatly in cost of application, although modern technology helps. Checking for terms in results is costly in principle, but modern spell-checkers show that it can be done fairly fast. For this task we create clique-specific dictionaries, by initially processing a substantial amount of approved results. In initial use the security officer will still get false failure reports, due to innocent terms that are not yet in the dictionary. Those will be incrementally added, so that in time the incidence of such failures will be minimal.

For example, we have in use a dictionary for ophtamology, to allow authenticated researchers in that field to have access to patient data. That dictionary does not include terms that would signal, say HIV infection or pregnancies, information which the patients would not like to see released to unknown research groups. Also, all proper names, places of employment, etc. are effectively filtered.

Figure 2. Extract from a report to the Security Officer

Several of these primitives are designed to help control inference problems in statistical database queries [AdamW:89]. While neither we, nor any feasible system can prevent leaks due to inference, we believe that careful management can make reduce the probability [Hinke:88]. Furthermore, providing the tools for analysis, as logging all accesses will reduce the practical threat [Hinke:88], [Sweeney:97]. The primitive to enforce dynamic limits on access frequencies will often have to refer to the log, so that efficient access to the log, for instance by maintaining a large write-through cache for the log, will be important. Here again the function of traditional database support and security mediation diverges, since database transaction are best isolated, where as inference control requires history maintenance.

4. Logging

Throughout, the failures, as well as the request text and source, and actions taken by the security officer, are logged by the system for audit purposes. Having a security log which is distinct from the database log is important since:

· A database system logs all transactions, not just external requests, and is hence confusingly voluminous

· Most database systems do not log attempted and failed requests fully, because they appear not to have affected the databases

· Reasons for failure of requests in database logs are implicit, and do not give the rules that caused them.

We provide user-friendly utilities to scan the security log by time, by requestor, by clique, and by data source. Offending terms in results are marked.

No system, except one that provides complete isolation, can be 100% foolproof. The provision of security is, unfortunately, a cat-and-mouse game, where new threats and new technologies keep arising. Logging provides the feedback which converts a static approach to a dynamic and stable system, which can maintain an adequate level of protection. Logs will have to be inspected regularly to achieve stability.

Bypassing of the entire system and hence the log remains a threat. Removal of information on portable media is easy. Only a few enterprises can afford to place controls on all personnel leaving daily for home, lunch, or competitive employment. However, having an effective and adaptable security filter removes the excuse that information had to be downloaded and shipped out because the system was to stringent for legitimate purposes. Some enterprises are considering limiting internal workstations to be diskless. It is unclear how effective this approach will be outside of small, highly secure domains in an enterprise. Such a domain will then have to be protected with its own firewall and a security mediator as well, because collaboration between the general and highly secure internal domains must be enabled.

5. Current State and Further Work

Our initial demonstrations have been in the healthcare domain, and a commercial version of TIHI is now in use to protect records of genomic analyses in a pharmaceutical company. As the expectations for protection of the privacy of patient data are being solidified into governmental regulations we expect that our approach will gain popularity [Braithwaite:96]. Today the healthcare establishment still hopes that commercial encryption tools will be adequate for the protection of medical records, since the complexity of managing access requirements has not yet been faced [RindKSSCB:97]. Expenditures for security in medical enterprises are minimal [NRC:97]. Funding of adequate provisions in an industry under heavy economic pressures, populated with many individuals who do not attach much value to the privacy of others, will remain a source of stress.

Non-textual contents

Identifying information is routinely deleted from medical records that are disseminated for research and education. However, here a gap existed as well: X-ray, MRI, and similar images accompany many records, and these also include information identifying the patient. We have developed software which recognizes such text using wavelet-based decomposition and analysis, extracts it, and can submit to the filtering system developed in TIHI. Information which is determined to be benign can be retained, and other text is effectively removed by omitting high-frequency components in the affected areas [WangWL:98].

We have also investigated our original motivating application area, namely manufacturing information. Here the simple web-based interfaces which are effective for the customer and the security officer interfaces in health care are not adequate. We have demonstrated interfaces for the general viewing and editing of design drawings and any attached textual information. In drawings significant text may be incorporated in the drawings themselves. When delivering an edited drawing electronically, we also have to assure that there is no hidden information. Many design formats allow undo operations, which would allow apparently deleted information to reappear.

Before moving to substantial automation for collaboration in manufacturing, we will have to understand the parameters for reliable filtering of such information better. However, as pointed out initially, even a fully manual security mediator will provide a substantial benefit to enterprises that are trying to institute shared efforts rapidly.

6. Advice.

However, break-ins still occur. Most of them are initiated via legitimate access paths, since the information in our systems must be shared with customers and collaborators. In that case the first three technologies provide no protection, and the burden falls on the mappings and the categorization if the information. Once users are permitted into the system, protection becomes more difficult.

In the near future the requirements for security of the medical record, be it on paper or in electronic form, will be increasing. Protection of what patients perceive to be their private information is becoming important. Legal obligations will arise as well, but limiting to protection to what appears to be legal minimum may well be unattractive. In any case, it will take some time before cases law catches up with legal guidelines. The management of healthcare institutions most be prepared to define policies and supervise their implementation. Assigning responsibilities for security to database or network personnel, who have primary responsibilities of making data and communication available, will conflict with security concerns and is unwise. These people are promoted to their positions because they have a helpful attitude and know how to overcome problems of system failures and inadequacies. This attitude is inherently in conflict with corporate responsibilities for the protection of data. Outside vendors of products will not advertise the weaknesses of their approaches to security, especially in respect to the complexity of the requirements imposed on a medical record.

We have presented security mediation as an architectural function as well as a specific service. Architecturally, expanding the role of a gateway in the firewall from a passive filter to an active pass gate service allows concentration of the responsibility for security to a single node, owned by the security officer. Existing technologies, as constraining authorization views over databases, encryption for transmission in networks, password management in operating systems, etc., can be managed via the security mediator node.

The specific, novel service presented here, result checking, complements traditional access control. We have received a patent to cover the concept. Checking results is especially relevant in systems with many types of users, including external collaborators, and complex information structures. In such settings the requirement that systems that are limited to access-control impose, namely that all data are correctly partitioned and filed is not achievable in practice. Result checking does not address all issues of security of course, as protection from erroneous or malicious updates, although it is likely that such attacks will be preceded by processes that extract information. A side-effect of result checking that it provides a level of intrusion detection.

The rule-based approach allows balancing of the need for preserving data security and privacy and for making data available. Data which is too tightly controlled reduces the benefits of sharable information in collaborative settings. Rules which are too liberal can violate security and expectation of privacy. Having a balanced policy will require directions from management. Having a single focus for execution of the policy in electronic transmission will improve the consistency of the application of the policy.

Result filtering does not solve all problems, in security, of course. They rely still on a minimum level of reliability in the supporting systems. They cannot compensate when information is missing or not found because of misidentification. In general, a security mediator cannot protect from inadvertent or intentional denial of information by a mismanaged database system.

Acknowledgements

Research leading to security mediators was supported by an NSF HPCC challenge grant and by DARPA ITO via Arpa order E017, as a subcontract via SRI International. Steve Dawson was the PI at SRI. The commercial transition was performed by Maggie Johnson, Chris Donahue, and Jerry Cain under contracts with SST (www.2ST.com). Work on editing and filtering graphics is due to Jahnavi Akalla and James Z. Wang. Some of this material has appeared in earlier publications [W:00].

References

[AMA:94] American Medical Association: “Confidentiality: Computers”; Code of Medical Ethics, Aamericamn Medical Assiciation, 1994.

[Beth:95] Thomas Beth: “Confidential Communication on the Internet”; Scientific American, December 1995, pp.88-91.

[CastanoFMS:95] S.Castano, M.G. Fugini, G.Martella, and P. Samarati: Database Security; Addison Wesley Publishing Company - ACM Press, 1995.

[ChapmanZ:95] D. Brent Chapman and Elizabeth D. Zwicky: Building Internet Firewalls; O’Reilly and Associates, 1995.

[CheswickB:94].William R.Cheswick and Steven M. Bellovin: Firewalls and Internet Security; Addison-Wesley, 1994.

[ClaytonEa:97] Paul Clayton (chair): For the Record; Protecting Electronic Health Information; National Academy Press, 1997.

[CPRI:96].Computer-based Patient Record Institute: Guidelines for managing Information Security Programs; Work Group on Confidentiality, Privacy, and Security, CPRI, 1996.

[DickS:91] Richard S. Dick and Elaine B. Steen (eds.) The Computer-based Medical Record:; An Essential Technology for Health Care; Institute of Medicine, National Academy Press, 1991.

[Didriksen:97] Tor Didriksen: “Rule-based Database Access control – A Practical Approach”; Proc. 2^nd ACM workshop on Rule-based Access Control, 1997, pp.143-151.

[Diffie:88] Whitfield Diifie: “The First Ten Years of Public-Key Cryptography”; Proc. IEEE, Vol.76 No.5, May 1988, pp.560-577.

[DonaldsonL:94] Molla S. Donaldson and Kathleen L. Lohr (eds): Health Data in the Information Age: Use, Disclosure, and Privacy; Institute of Medicine, National Academy Press, 1994.

[EvansEa:86] R. Scott Evans et al.: “Computer Surveillance of Hospital-acquired Infections and Antibiotic Use”; J. of the AMA, Vol.256 No.8, 1986, pp.1007-1011.

[GriffithsW:76] Patricia P. Griffiths and Bradford W. Wade: “An Authorization Mechanism for a Relational Database System”; ACM Trans. on Database Systems, Vol.1 No.3, Sept.1976, pp.242-255.

[HardwickS:96] M. Hardwick, D.L. Spooner, T. Rando, and KC Morris: "Sharing Manufacturing Information In Virtual Enterprises"; Comm. ACM, Vol.39 no.2, pp.46-54, February 1996.

[HolmesWM:91] J.P. Holmes, L.J. Wright, and R.L.Maxwe: A Performance Evaluation of Biometric Identification Devices; Sandia Report SAND91-0276, Sandia National Laboratories, June 1991.

[JohnsonSV:95?] Johnson DR, Sayjdari FF, Van Tassel JP.: Missi security policy: A formal approach. Technical Report R2SPO-TR001, National Security Agency Central Service, July 1995.

[KonheimEa:80] A.G. Konheim, M.H.Mack, R.K. McNeil, B. Tuckerman: The IPS Cryptographic Programs”; IBM Sys. J., Vol.19 No2, 1980, pp.302-307.

[LandwehrHM:84] Carl E. Landwehr, C.L. Heitmyer, and J.McLean: “A Security Model for Military Message Systems”; ACM Trans. on Computer Systems, Vol.2 No.3, Aug. 1984, pp. 198-222.

[LuniewskiEa:93] Luniewski, A. et al. "Information organization using Rufus" SIGMOD '93, ACM SIGMOD Record, June 1993, vol.22, no.2 p. 560-1

[QianW:97] Qian, XioaLei and Gio Wiederhold: "Protecting Collaboration"; abstract for IEEE Information Survivability Workshop, ISW'97, Feb.1997, San Diego.

[Regan:95] Priscilla M. Regan: Legislating Privacy, Technology, Social Values. and Public Policy; University of North Carolina Press, 1995.

[RindKSSCB:97] David M. Rind, Isaac S. Kohane, Peter Szolovits, Charles Safran, Henry C. Chueh, and G. Octo Barnett: "Maintaining the Confidentiality of Medical Records Shared over the Internet and the World Wide Web"; Annals of Internal Medicine 15 July 1997. 127:138-141.

[Rindfleisch:97] Thomas C. Rindfleisch: Privacy, Information Technology, and Health Care; Comm. ACM; Vol.40 No. 8 , Aug.1997, pp.92-100.

[SchaeferS:95] M. Schaefer, G. Smith: “Assured discretionary access control for trusted RDBMS”; in Proceedings of the Ninth IFIP WG 11.3 Working Conference on Database Security, 1995:275-289.

[Seligman:99] Len Seligman, Paul Lehner, Ken Smith, Chris Elsaesser, and David Mattox: "Decision-Centric Information Monitoring"; Jour. of Intelligent Information Systems (JIIS), Vol.14, No.1.; also at http://www.mitre.org/pubs/edge/june_99/dcim.doc

[Sweeney:96] Latanya Sweeney: "Replacing personally-identifying information in medical records, the SCRUB system"; Cimino, JJ, ed. Proceedings, Journal of the American Medical Informatics Association, Washington, DC: Hanley & Belfus, 1996, Pp.333-337.

[Sweeney:97] Latanya Sweeney: "Guaranteeing anonymity when sharing medical data, the DATAFLY system"; Proceedings, Journal of the American Medical Informatics Association, Washington DC, Hanley & Belfus, 1997.

[Ullman:97?] Jeffrey Ullman: Information Integration Using Logical Views; International Conference on Database Theory (ICDT '97) Delphi, Greece, ACM and IEEE Computer Society, 1997.

[Venema:92] Wietse Venema: “TCP wrapper: Network Monitoring, Access Control, and Booby Traps”; Proc.3^rd Usenix Security Symp., Baltimore MD, 1992.

[WangWL:98] James Z. Wang, Gio Wiederhold and Jia Li: Wavelet-based Progressive Transmission and Security Filtering for Medical Image Distribution"; in Stephen Wong (ed.): Medical Image Databases; Kluwer publishers, 1998, pp.303- 324.

[WiederholdBC:98] Gio Wiederhold, Michel Bilello, and Chris Donahue: "Web Implementation of a Security Mediator for Medical Databases"; in T.Y. Lin and Shelly Qian:Database Security XI, Status and Prospects, IFIP / Chapman & Hall, 1998, pp.60-72.

[WiederholdBSQ:96] Gio Wiederhold, Michel Bilello, Vatsala Sarathy, and XiaoLei Qian: A Security Mediator for Health Care Information"; Journal of the AMIA, issue containing the Proceedings of the 1996 AMIA Conference, Oct. 1996, pp.120-124.

[Wiederhold:00] Gio Wiederhold: “Protecting Information when Access is Granted for Collaboration”; to appear in Springer Verlag Volume<>

------------------------------ o ---------------------------------------------------------------- o ----------------------------