Proposed relation to capture this information:
Apply(ID, name, campus, sport, HScode, HScity)
Consider "Mary" (ID 123) who is applying to Berkeley and Santa Cruz. She played tennis, basketball, and volleyball at both of her high schools, Gunn (code 26) and Menlo-Atherton (code 28).
Question: How many tuples in the relation?
Apply
relation exhibits three types of anomalies:
Good designs:
Apply(ID,name,campus)
Question: Is this a good design or should we decompose?
Can have many duplicates of each
(ID,name)
ID
always has the same
name
(but not necessarily vice-versa)
ID-name
connection once
Boyce-Codd Normal Form says if A,B are both in relation R, then A must be a key => A-B connection stored only once.
BCNF decomposition says make a separate relation (A,B) and take B out of the original.
Apply(ID,campus,sport)
Question: Is this a good design or should we decompose?
Multiplicative effect: C campuses and S sports yields CxS tuples
Question: captured by FDs and BCNF?
Multivalued dependency "A ->> B" says a given A value always has every combination of B and C values (C is remaining attributes).
Fourth Normal Form says if A,B are both in relation R, then A must be a key => don't get multiplicative effect of B and C.
4NF decomposition says make a separate relation (A,B) and take B out of the original.
Student(ID, name, address, HScode, HSname, HScity, cumGPA, priority) Apply(ID, campus, date, major)
Student.priority
is determined by
Student.cumGPA
.
(show example)
Formally:
For every pair of tuples t and u in Student: if t[cumGPA] = u[cumGPA] then t[priority] = u[priority]This is a "functional dependency" (FD): a constraint specified with the schema of a relation.
A1, A2, ..., Am -> B1, B2, ..., Bn (commas may be omitted)states that for every pair of tuples t and u in R: if
t[A1,...,Am]
= u[A1,...,Am]
then t[B1,..,Bn] = u[B1,..,Bn]
.
We will abbreviate A1, A2, ..., Am
as AA
(or
"A-bar") and B1, B2, ..., Bn
as BB
(or "B-bar").
(simple abstract example)
Question: What are some functional dependencies for
Student
besides cumGPA -> priority
?
Question: What are some functional dependencies for
Apply?
(show abstract example)
Note:
(example)
(example)
(example)
=> Most of the time we're interested in completely nontrivial FD's.
Question: Can we also split the left-hand side?
Find all attributes B in R such that A1, A2, ..., Am -> BThis set of attributes is the "closure" and is denoted {A1, A2, ..., Am}+
Algorithm for computing closure:
start with {A1, A2, ..., Am} repeat until no change: if current set of attributes includes LHS of a dependency, add RHS attributes to the set(Effectively applies combining and transitive rules until there's no more change.)
Example: closure of {ID, HScode}
in Student
given FD's:
ID -> name, address, cumGPA cumGPA -> priority HScode -> HSname, HScityQuestion: How can we exploit closure to test whether a set of attributes is a key?
Related question: How can we find all keys given a set of FD's?
Example: apply rules above
When specifying FD's for a relation, we would like to specify a minimal set of completely nontrivial FD's such that all FD's that hold on the relation follow from the dependencies in this set. Sounds hard, but in practice this approach comes naturally.
Question: How can we tell if one FD follows from others?
Example: Does AA -> BB follow from a set S of FD's?
(diagram)
Idea of decomposition:
Example decomposition:
Student(ID, name, address, HScode, HSname, HScity, cumGPA, priority)decomposed to
Student1(ID, name, address, HScode, cumGPA, priority) Student2(HScode, HSname, HScity)(check criteria for decomposition)
Another example decomposition:
Student1(ID, name, address, HScode, HSname, HScity) Student2(name, HSname, cumGPA, priority)(check criteria for decomposition)
Given: relation R and set of FD's for R
Definition: R is in BCNF with respect to its FD's if for every nontrivial FD AA -> BB, AA contains a key.
Question: Why does violating this requirement produce a "bad" relation?
Example:
Student(ID, name, address, HScode, HSname, HScity, cumGPA, priority) FD's: ID -> name, address, cumGPA cumGPA -> priority HScode -> HSname, HScity Key: ID, HScode
Question: Is the relation in BCNF?
Each violation produces anomalies.
Apply(ID, campus, date, major)Can apply to campus multiple times for different majors, but can only apply to a campus once per day
FD's: ID, campus, date -> major Key: ID, campus, date
Question: Is the relation in BCNF?
Input: relation schema R and set of FD's for R (1) compute keys for R based on FD's (2) repeat until no more BCNF violations: (2a) pick any R' with AA -> BB that violates BCNF (2b) decompose R' into R1(AA,BB) and R2(AA,CC) where CC is all attributes in R' except (AA U BB) (2c) compute FD's for R1 and R2 (2d) compute keys for R1 and R2 based on FD's(diagram)
Question: How can we compute keys in steps (1) and (2d)?
Question: How do we compute FD's in step (2c)?
(run algorithm on
Student
example)
=> Final decomposed relations may be different depending on which violating FD is chosen in each iteration (step 2(a)), but all decompositions will be in BCNF.
Does BCNF guarantee a good decomposition?
Consider R(A,B,C) with tuples (1,2,3) and (4,2,5)
Decompose into R1(A,B) with tuples (1,2) and (4,2) R2(B,C) with tuples (2,3) and (2,5)
"Reassembled" relation has four tuples => wrong
But BCNF would not decompose this way unless B -> C or B -> A
Apply(ID, campus, major) // removed date from earlier exampleSuppose students can apply to several campuses and different majors, but must apply to the same set of majors at every campus.
Question: What are FD's?
Question: What is key?
Question: Is it in BCNF?
Question: Is it a "good" design?
MVD Definition:
AA ->> BB is an MVD for relation R if: For all tuples t,u in R: If t[AA] = u[AA] then there exists a v in R such that: (1) v[AA] = t[AA] (2) v[BB] = t[BB] (3) v[CC] = u[CC] where CC is all attributes in R except (AA U BB)' (show with picture; show implied fourth tuple)
MVD's are also called "tuple-generating dependencies."
Back to example: What are MVD's?
(show example data to verify)
Intuition: MVD's uncover situations where independent facts related to a certain object are being squished together in one relation.
Prove by showing (1), (2), (3) in MVD definition.
Question: Are there any rules for FD's that do not apply for MVD's?
Given: relation R and set of MVD's for R
Definition: R is in 4NF with respect to its MVD's if for every nontrivial MVD AA ->> BB, AA contains a key.
Note: Since every FD is also an MVD, 4NF implies BCNF
Question: What happens in the MVD definition if AA contains a key?
Input: relation schema R and set of FD's and MVD's for R (1) compute keys for R based on FD's (2) repeat until no more 4NF violations: (2a) pick any R' with AA ->> BB that violates 4NF (2b) decompose R' into R1(AA,BB) and R2(AA,CC) where CC is all attributes in R' except (AA U BB) (2c) compute FD's and MVD's for R1 and R2 (2d) compute keys for R1 and R2 based on FD's
Question: How do we compute MVD's in step (2c)?
(run algorithm on Apply
example)
Apply(ID, campus, date, major)
Question: What are FDs and keys?
Question: What is BCNF decomposition?
Question: Is this decomposition "good"?
Third Normal Form (3NF) is weaker than BCNF (and therefore weaker than 4NF). It permits
Apply
as above to remain undecomposed.
=> Material on 3NF (Section 3.6.6) is required reading for the course, not covered in lecture.
Student(ID, HScode, cumGPA, priority) ID -> cumGPA cumGPA -> priority ID -> priorityNote:
ID
is not a key.
Apply BCNF decomposition splitting first on ID -> cumGPA
Question: Is the resulting decomposition "good"?
Heuristic: "close" each FD before beginning decomposition
=> Overall, BCNF/4NF decomposition does not guarantee that all of the original FDs can be enforced on the individual decomposed relations.