For all but the very simplest database application, there are many, many different relational database "designs" (schemas) that can be used to store the relevant data.

- Q: Is one schema better than another?
- A: Often, yes

Student(ID, name, address, HScode, HSname, HScity, cumGPA, priority) Apply(ID, campus, date, major)Suppose

*(show example)*

Formally:

For every pair of tuples t and u in Student: if t[cumGPA] = u[cumGPA] then t[priority] = u[priority]This is a "functional dependency" (FD): a constraint specified with the schema of a relation.

- Based on knowledge of the real world
- All data must adhere to it

A1, A2, ..., Am -> B1, B2, ..., Bn (commas may be omitted)states that for every pair of tuples t and u in R: if

We will abbreviate `A1, A2, ..., Am` as `AA` (or
"A-bar") and `B1, B2, ..., Bn` as `BB` (or "B-bar").

*(simple abstract example)*

- AA -> BB, where BB is all attributes of R
- no subset of AA satisfies (1), i.e., AA is minimal

Note that key is required to be minimal; introduce "superkey"

*(diagram)*

*(diagram)*

*(diagram)*

=> Most of the time we're interested in completely nontrivial FD's.

*Question: Can we also split the left-hand side?*

If AA -> BB then AA -> (BB U AA)

Find all attributes B in R such that A1, A2, ..., Am -> BThis set of attributes is the "closure" and is denoted {A1, A2, ..., Am}+

**Algorithm** for computing closure:

start with {A1, A2, ..., Am} repeat until no change: if current set of attributes includes LHS of a dependency, add RHS attributes to the set(Effectively applies combining and transitive rules until there's no more change.)

*Example:* closure of `{ID, HScode}` in `Student` given FD's:

ID -> name, address, cumGPA cumGPA -> priority HScode -> HSname, HScity

*Example:* apply rules above

When specifying FD's for a relation, we would like to specify a

*Question: How can we tell if one FD follows from others?*

*Example:* Does AA -> BB follow from a set S of FD's?

- Start with set of relations
- Define FD's (and therefore keys) for them based on real world
- Transform relations to "normal form" ("normalize" the relations)

A: This one is really bad:

Apply(ID, name, campus, sport, HScode, HScity)(Assume students play the same sports at every high school attended.)

Consider "Mary" (ID 123) who is applying to Berkeley and Santa Cruz. She played tennis, basketball, and volleyball at both of her high schools, Gunn (code 26) and Menlo-Atherton (code 28).

*Question: How many tuples in the relation?*

This relation exhibits three types of "anomalies":

- Redundancy
- Update Anomaly
- Deletion Anomaly

Good design:

- No anomalies
- Can reconstruct all original information

**Definition:** R1(A1, ..., Am) / R2(B1, ..., Bn) is a decomposition of
R(C1, ..., Ck) if {A1, ..., Am} U {B1, ..., Bn} = {C1, ..., Ck}

*(diagram)*

Idea of decomposition:

- R1 = PROJECT[A1, ..., Am] (R), eliminating duplicates
- R2 = PROJECT[B1, ..., Bn] (R), eliminating duplicates
- R1 NATURAL-JOIN R2 = R

*Example decomposition:*

Apply1(ID, name, campus) Apply2(ID, sport, HScode, HScity)

*Another example decomposition:*

Apply1(ID, name, HScode, HScity) Apply2(name, campus, sport)

**Given:** relation R and set of FD's for R

**Definition:** R is in BCNF with respect to its FD's if for every
nontrivial FD AA -> BB, AA contains a key.

*Question: Why does violating this requirement produce a "bad" relation?*

Student(ID, name, address, HScode, HSname, HScity, cumGPA, priority) FD's: ID -> name, address, cumGPA cumGPA -> priority HScode -> HSname, HScity Key: ID, HScode

*Question: Is the relation in BCNF?*

Each violation produces anomalies.

Apply(ID, campus, date, major)Can apply to campus multiple times for different majors, but can only apply to a campus once per day

FD's: ID, campus, date -> major Key: ID, campus, date

*Question: Is the relation in BCNF?*

Input: relation schema R and set of FD's for R (1) compute keys for R based on FD's (2) repeat until no more BCNF violations: (2a) pick any R' with AA ->> BB that violates BCNF (2b) decompose R' into R1(AA,BB) and R2(AA,CC) where CC is all attributes in R' except (AA U BB) (2c) compute FD's for R1 and R2 (2d) compute keys for R1 and R2 based on FD's

*Question: How can we compute keys in steps (1) and (2d)?*

*Question: How do we compute FD's in step (2c)?*

=> Final decomposed relations may be different depending on which violating FD is chosen in each iteration (step 2(a)), but all decompositions will be in BCNF.

Does BCNF guarantee a good decomposition?

- Removes anomalies? Yes
- Reconstructs original relation?
- Can we get too many tuples?
Consider R(A,B,C) with tuples (1,2,3) and (4,2,5)

Decompose into R1(A,B) with tuples (1,2) and (4,2) R2(B,C) with tuples (2,3) and (2,5)

"Reassembled" relation has four tuples => wrong

But BCNF would not decompose this way unless B -> C or B -> A

- Can we get too few tuples? No

- Can we get too many tuples?

- Set of relations without anomalies
- Can reconstruct original relation

**Definition:**

AA ->> BB is an MVD for relation R if: For all tuples t,u in R: If t[AA] = u[AA] then there exists a v in R such that: (1) v[AA] = t[AA] (2) v[BB] = t[BB] (3) v[CC] = u[CC] where CC is all attributes in R except (AA U BB)

MVD's are also called "tuple-generating dependencies."

*Example:*

` Apply(ID, campus, sport) `

Assume all students apply to each campus with all sports.

*Question: What are FD's?*

*Question: What is key?*

*Question: Is it in BCNF?*

*Question: What are MVD's?*

*(show example data to verify)*

Prove by showing (1), (2), (3) in MVD definition.

*Question: Are there any rules for FD's that do not apply for MVD's?*

**Definition:** R is in 4NF with respect to its MVD's if for
every nontrivial MVD AA ->> BB, AA contains a key.

**Note:** Since every FD is also an MVD, 4NF implies BCNF

*Question: What happens in the MVD definition if AA contains a key?*

Input: relation schema R and set of FD's and MVD's for R (1) compute keys for R based on FD's (2) repeat until no more 4NF violations: (2a) pick any R' with AA ->> BB that violates 4NF (2b) decompose R' into R1(AA,BB) and R2(AA,CC) where CC is all attributes in R' except (AA U BB) (2c) compute FD's and MVD's for R1 and R2 (2d) compute keys for R1 and R2 based on FD's

*Question: How do we compute MVD's in step (2c)?
*

*(run algorithm on Apply example)*