CS145 Lecture Notes -- Relational Algebra

Steps in creating and using a (relational) database:
1. Design schema

2. Create schema (using DDL)

=> Now have populated database

4. Repeat: execute queries and updates on the database

Database Query Languages

• Get all students with GPA > 3.7 who applied to Berkeley and San Diego and nowhere else.
• Get all humanities departments at campuses in southern California with < 1000 applicants.
• Get the campus with highest average accept rate over the last five years.

• Some questions are easy to pose, some are not.
• Some questions are easy for DBMS to answer, some are not.
• "Query language" also used to update the database.

Relational Query Languages

• Formal: relational algebra, relational calculus, Datalog

• Actual: SQL, Quel

• In all languages, a query is executed over a set of relations, get a relation as the result.

Relational Algebra

Example schema:
```   Student(ID, name, address, GPA, SAT)        // ID is key
Campus(location, enrollment, rank)          // location is key
Apply(ID, location, date, major, decision)  // (ID,location) is key

```

Simplest query: relation name

Query "Campus" returns all tuples in Campus relation.

Relational algebra operators used to combine, filter, etc. the relations:

SELECT operator

(Example: Students with GPA > 3.7)
```

```
(Example: Students with GPA > 3.7 and SAT < 1300)
```

```
(Example: Applications to S.C. geology major)
```

```
** General case: SELECT[C] (R)

where C can include attribute names, constants, comparisons (=, <, etc.), and connectives (AND, OR, NOT)

PROJECT operator

(Example: ID and date of all applications)
```

```
PROJECT eliminates columns while SELECT eliminates rows.

To do both, combine ("compose") operators:

(Example: name and address of Students with GPA > 3.7 or SAT > 1400)

```

```
SELECT produces a relation, PROJECT operates on that relation.

** General case: PROJECT[A1, A2, ..., Am] (E)

where E is any expression producing a relation

Question: Is it ever useful to compose two projection operators?

Example: PROJECT[enrollment] (PROJECT[location, enrollment] (Campus))

```

```

Question: Is it ever useful to compose two selection operators?

Example: SELECT[SAT < 1300] (SELECT[GPA > 3.7] (Student))

```

```

Duplicates

Example: PROJECT[date,decision] (Apply) can produce many tuples with same value
• Relational algebra semantics says remove duplicates.
• SQL does not - one difference between formal and actual query languages

Cartesian Product (cross-product)

Campus X Apply

Schema of result is schema(Campus) union schema(Apply)

```

```
• Naming conflicts: prepend relation name
• One tuple in result for each pair of tuples in Campus, Apply

Formally: R1 X R2 = {t | t = <t1,t2> and t1 in R1 and t2 in R2}

Question: Looks odd to glue unrelated tuples together. Why use it?

```

```
(Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
```

```
Question: Can we write it in a different way?
```

```

Natural Join

• Enforces equality on all attributes with same name
• Eliminates one copy of duplicate attributes

(Show schema of Campus JOIN Apply)

```

```
(Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
```

```
(Example: Names and addresses of all students with GPA > 3.7 who applied to CS major at campus with enrollment < 15,000 and were rejected)
```

```
** General case:
E1 JOIN E2 = PROJECT[schema(E1) U schema(E2)] (SELECT[E1.A = E2.A and E1.B = E2.B and ...] (E1 X E2))
```

```
Need to be careful -- suppose we have:
```   Student(ID, name, address, GPA, SAT)
Campus(name, enrollment, rank)
```
"Student JOIN Campus" doesn't make sense.

Theta Join

E1 JOIN[C] E2 = SELECT[C] (E1 X E2)
```

```
• DBMS often implements theta-join as basic operation.
• Use of term "join" in implementation circles usually refers to theta-join or sometimes to cross-product.

Union

Example schema:
```   Student(ID, name, address, GPA, SAT)
OutofStateStudent(ID, name, address, GPA, SAT, state)
ForeignStudent(ID, name, address, GPA, SAT, country)
```

(Example: List name and address of all students)

Question: Can we do it with cross-products or joins?

```

```
For union operator:
• Schemas must match exactly.
• Duplicates eliminated (as usual).
(Example: List name and address of all students)
```

```

Difference

Back to original schema:
```   Student(ID, name, address, GPA, SAT)        // ID is key
Campus(location, enrollment, rank)          // location is key
Apply(ID, location, date, major, decision)  // (ID,location) is key
```
(Example: find ID's of all students who didn't apply anywhere)
```

```
Question: What if want name of students?
```

```
For difference operator:
• Schemas must match exactly, duplicates eliminated

Intersection

As expected. Note E1 INTERSECT E2 = E1 - (E1 - E2)

Renaming

Suppose we have:
```   Student(ID, name, address)
Parent(child-ID, parent-name)
```
(Example: want list of all student and parent names)
```

```
(Example: want names of all pairs of students who live at same address)
(Example: no dups)
```

```
** General case: RENAME[R(A1, A2, ..., Am)] (E) or RENAME[R] (E)

Summary of Relational Algebra

Basic:
```   E ::= R
|  SELECT[C] (E)
|  PROJECT[A1, A2, ..., An] (E)
|  E1 X E2
|  E1 U E2
|  E1 - E2
|  RENAME[R(A1, A2, ..., Am)] (E)

```
Abbreviations:
```      |  E1 JOIN E2
|  E1 JOIN[C] E2
|  E1 INTERSECT E2
```
```

```