Data Security
Dorothy E. Denning and Peter J. Denning
Overview
Absolute data security is impossible (security is economics),
but we can try to make the computer hardware and software not be the weakest
links. This paper examines four kinds of controls which provide needed
protections to different areas: access controls, flow controls, inference
controls, and cryptographic controls.
Access Controls
Access controls govern the availability of objects to
users for various uses. For example, records in a database, or files in a
filesystem, can be read or written by some users but not others. Three main
features are necessary:
- Proper user identification is available
- Passwords ("something you know") are easy to implement, but not hard to
thwart.
- (The paper did not mention tokens or smartcards: "something you have".)
- Biometrics ("something you are") are expensive.
- No snooping is going on, either:
- Network snooping: watching the data being accessed by another person
- Data retrieval: stealing backup tapes or disks
Use encryption to
foil snoops.
- Access control information is very privileged
- No normal user should be able to modify the list of who can access what
data.
Different kinds of systems have different requirements.
- Transaction Processing Systems
- Users cannot write their own programs, so only the user interface needs
to be protected.
- The user interface can keep track of the user and only issue queries for
records to which the user has access
- General Purpose Systems
- Protections in the runtime environment and the hardware are usually
needed.
- Use the Principle of Least Privilege to protect against trojan
horses (and bugs!): compare setuid to rings of privilege
- Use ACLs and capabilities to grant access to objects.
- Revoking access is easy with a centralized capability list (though most
Unices don't allow this); harder with distributed capabilites (in this case,
you can "link" capabilities through the owner.
- The important question is the "safety problem": can user X read file Y?
Unfortunately, this is unsolvable (the halting problem can be reduced to
this).
Flow Controls
Flow controls govern the ability of information to be
transmitted from one part of the system to another (or, ultimately, from one
user to another). The main idea is to assign a "security class" to each piece of
data, and to require that the security class of data cannot be lowered.
- Often, this is very coarse-grained: processes have a security class, and
can only read data of that class or lower, and only write to that class or
higher.
- This is a problem for processes that need to manipulate data of different
classes; data tends to become overclassified.
- Data flow analysis can alleviate this problem, but is potentially
complicated and expensive. (taintperl does something like this.)
- Covert channels (transmitting data by some non-obvious means, such as run
time, power consumption, or load average) are extremely hard to eliminate.
Inference Controls
Inference controls govern the ability of users to
determine specific information in a database, if they are allowed to query for
summary information. The systems must try to make the cost of reconstructing the
specific information to be prohibitive. Three types of controls are possible:
- Restrict queries
- minimum query set control
- If a query results in only one record having a certain combination of
characteristics, more specific queries can determine other information
about that particular record.
- It turns out, though, that restricting queries to those with a certain
min and max size isn't good enough.
- partitioned database
- Store records in groups instead of individually; allow queries about
goups only, not individuals.
- Bad groupings can cause misleading statistics.
- Dynamically changing databases can be expensive to continually
regroup.
- Distort responses
- Adding random values to the data can usually be defeated.
- Introducing large enough errors to defeat an attacker usually produces
bad statistics.
- Data swapping is better, but it is usually hard to find appropriate
records whose fields could be swapped.
- Random Samples
- Apply queries only to (pseudo-)random samples of the database.
- Dynamic databases do not really benefit from this, apparently.
- Combining this and minimum query set control works well.
As well, use threat monitoring to watch for suspicious
queries in log files (but then what about the privacy issues of this?!).
Cryptographic Controls
Cryptographic controls govern who can read data
that isn't protected (by an operating system, for example). This includes data
being transmitted over a network, and data stored on disks or tapes. There are
two major classes of encryption:
- Symmetric encryption
- The key must be transmitted to the recepient separately, and in a secure
manner.
- Different schemes for key management exist (kerberos, for example); this
is one of the most important parts of the system.
- Asymmetric encryption
- Slower than symmetric encryption, but no need to have a secure channel
to transmit encryption keys.
- There is still the issue of verifying the authenticity of public keys
(PGP web of trust vs. SSL certification authorities).
(Usually, the performance issue is mitigated by using a
hybrid approach: pick a random key, use it to do symmetric encryption, and
transmit the result of this encryption, as well as the result of encrypting the
key itself with public key encryption.)