CS145 Lecture Notes (3) -- Relational Algebra



Steps in creating and using a (relational) database:
  1. Design schema

  2. Create schema (using DDL)

  3. "Bulk load" initial data
    => Now have populated database

  4. Repeat: execute queries and updates on the database

Database Query Languages

Given a database, ask questions, get data as answers

Relational Query Languages


Relational Algebra

Example schema:
   Student(ID, name, address, GPA, sizeHS)     // ID is key
   Campus(location, enrollment, rank)          // location is key
   Apply(ID, location, date, major, decision)  // (ID,location) is key

Simplest query: relation name

Query "Campus" returns all tuples in Campus relation.

Relational algebra operators used to combine, filter, etc. the relations:

SELECT operator

(Example: Students with GPA > 3.7)



(Example: Students with GPA > 3.7 and sizeHS < 1000)



(Example: Applications to S.C. geology major)



** General case: SELECT[C] (R)

where C can include attribute names, constants, comparisons (=, <, etc.), and connectives (AND, OR, NOT)

PROJECT operator

(Example: ID and date of all applications)



PROJECT eliminates columns while SELECT eliminates rows.

To do both, combine ("compose") operators:

(Example: name and address of Students with GPA > 3.7 and sizeHS < 1000)




SELECT produces a relation, PROJECT operates on that relation.

** General case: PROJECT[A1, A2, ..., Am] (E)

where E is any expression producing a relation

Question: Is it ever useful to compose two projection operators?

Example: PROJECT[enrollment] (PROJECT[location, enrollment] (Campus))




Question: Is it ever useful to compose two selection operators?

Example: SELECT[sizeHS < 1000] (SELECT[GPA > 3.7] (Student))




Duplicates

Example: PROJECT[date,decision] (Apply) can produce many tuples with same value

Cartesian Product (cross-product)

Campus X Apply

Schema of result is schema(Campus) union schema(Apply)







  • Naming conflicts: prepend relation name
  • One tuple in result for each pair of tuples in Campus, Apply

    Formally: R1 X R2 = {t | t = <t1,t2> and t1 in R1 and t2 in R2}

    Question: Looks odd to glue unrelated tuples together. Why use it?

    
    
    
    
    
    
    (Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
    
    
    
    
    
    
    Question: Can we write it in a different way?
    
    
    
    
    
    

    Natural Join

  • Enforces equality on all attributes with same name
  • Eliminates one copy of duplicate attributes

    (Show schema of Campus JOIN Apply)

    
    
    
    
    (Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
    
    
    
    
    
    
    (Example: Names and addresses of all students with GPA > 3.7 who applied to CS major at campus with enrollment < 15,000 and were rejected)
    
    
    
    
    
    
    ** General case:
    E1 JOIN E2 = PROJECT[schema(E1) U schema(E2)] (SELECT[E1.A = E2.A and E1.B = E2.B and ...] (E1 X E2))
    
    
    
    
    
    Need to be careful -- suppose we have:
       Student(ID, name, address, GPA, sizeHS)
       Campus(name, enrollment, rank)
    
    "Student JOIN Campus" doesn't make sense.

    Theta Join

    E1 JOIN[C] E2 = SELECT[C] (E1 X E2)
    
    
    
    
    
  • DBMS often implements theta-join as basic operation.
  • Use of term "join" in implementation circles usually refers to theta-join or sometimes to cross-product.

    Union

    Example schema:
       Student(ID, name, address, GPA, sizeHS)
       OutofStateStudent(ID, name, address, GPA, sizeHS, state)
       ForeignStudent(ID, name, address, GPA, country)
    

    (Example: List name and address of all students)

    Question: Can we do it with cross-products or joins?

    
    
    
    
    
    For union operator: (Example: List name and address of all students)
    
    
    
    

    Difference

    Back to original schema:
       Student(ID, name, address, GPA, sizeHS)     // ID is key
       Campus(location, enrollment, rank)          // location is key
       Apply(ID, location, date, major, decision)  // (ID,location) is key
    
    (Example: Find ID's of all students who didn't apply anywhere)
    
    
    
    
    Question: What if want name of students?
    
    
    
    
    For difference operator:

    Intersection

    As expected. Note E1 INTERSECT E2 = E1 - (E1 - E2)

    Renaming

    Suppose we have:
       Student(ID, name, address)
       Parent(child-ID, parent-name)
    
    (Example: Print list of all student and parent names)
    
    
    
    
    (Example: Find names of all pairs of students who live at same address)
    (Example: no dups)
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ** General case: RENAME[R(A1, A2, ..., Am)] (E) or RENAME[R] (E)

    Assigning to Temporaries

    (Example: previous query using sequence of assignments)
    
    
    
    
    
    
    
    
    
    
    
    
    
    


    Summary of Relational Algebra

    Basic:
       E ::= R
          |  SELECT[C] (E)
          |  PROJECT[A1, A2, ..., An] (E)
          |  E1 X E2
          |  E1 U E2
          |  E1 - E2
          |  RENAME[R(A1, A2, ..., Am)] (E)
    
    
    Abbreviations:
          |  E1 JOIN E2
          |  E1 JOIN[C] E2
          |  E1 INTERSECT E2
          |  TempName(A1, A2, ..., Ak) := E
          |  E1 ; E2