Database Systems: The Complete Book
Solutions for Chapter 5

Solutions for Section 5.2
Solutions for Section 5.3
Solutions for Section 5.4
Solutions for Section 5.5

Solutions for Section 5.2

Exercise 5.2.1(a)

PI_model( SIGMA_{speed >= 1000} ) (PC)

Exercise 5.2.1(f)

The trick is to theta-join PC with itself on the condition that the hard disk sizes are equal. That gives us tuples that have two PC model numbers with the same value of hd. However, these two PC's could in fact be the same, so we must also require in the theta-join that the model numbers be unequal. Finally, we want the hard disk sizes, so we project onto hd.

The expression is easiest to see if we write it using some temporary values. We start by renaming PC twice so we can talk about two occurrences of the same attributes.

R1 = RHO_{PC1} (PC)
R2 = RHO_{PC2} (PC)
R3 = R1 JOIN_{PC1.hd = PC2.hd AND PC1.model <> PC2.model} R2
R4 = PI_{PC1.hd} (R3)

Exercise 5.2.1(h)

First, we find R1, the model-speed pairs from both PC and Laptop. Then, we find from R1 those computers that are ``fast,'' at least 133Mh. At the same time, we join R1 with Product to connect model numbers to their manufacturers and we project out the speed to get R2. Then we join R2 with itself (after renaming) to find pairs of different models by the same maker. Finally, we get our answer, R5, by projecting onto one of the maker attributes. A sequence of steps giving the desired expression is:
R1 = PI_{model,speed} (PC) UNION PI_{model,speed} (Laptop)
R2 = PI_{maker,model} (SIGMA_{speed>=700} (R1) JOIN Product)
R3 = RHO_{T(maker2, model2)} (R2)
R4 = R2 JOIN_{maker = maker2 AND model <> model2} (R3)
R5 = PI_{maker} (R4)

Exercise 5.2.2

Here are figures for the expression trees of Exercise 5.2.1 Part (a) Part (f) Part (h). Note that the third figure is not really a tree, since it uses a common subexpression. We could duplicate the nodes to make it a tree, but using common subexpressions is a valuable form of query optimization. One of the benefits one gets from constructing ``trees'' for queries is the ability to combine nodes that represent common subexpressions.

Exercise 5.2.7

The relation that results from the natural join has only one attribute from each pair of equated attributes. The theta-join has attributes for both, and their columns are identical.

Exercise 5.2.9(a)

If all the tuples of R and S are different, then the union has n+m tuples, and this number is the maximum possible.

The minimum number of tuples that can appear in the result occurs if every tuple of one relation also appears in the other. Surely the union has at least as many tuples as the larger of R and that is, max(n,m) tuples. However, it is possible for every tuple of the smaller to appear in the other, so it is possible that there are as few as max(n,m) tuples in the union.

Exercise 5.2.10

In the following we use the name of a relation both as its instance (set of tuples) and as its schema (set of attributes). The context determines uniquely which is meant.

  1. PI_R(R JOIN S) Note, however, that this expression works only for sets; it does not preserve the multipicity of tuples in R. The next two expressions work for bags.

  2. R JOIN DELTA(PI_{R INTERSECT S}(S)) In this expression, each projection of a tuple from S onto the attributes that are also in R appears exactly once in the second argument of the join, so it preserves multiplicity of tuples in R, except for those that do not join with S, which disappear. The DELTA operator removes duplicates, as described in Section 5.4.

  3. R - [R - PI_R(R JOIN S)] Here, the strategy is to find the dangling tuples of R and remove them.

Return to Top

Solutions for Section 5.3

Exercise 5.3.1

As a bag, the value is {700, 1500, 866, 866, 1000, 1300, 1400, 700, 1200, 750, 1100, 350, 733}. The order is unimportant, of course. The average is 959.

As a set, the value is {700, 1500, 866, 1000, 1300, 1400, 1200, 750, 1100, 350, 733}, and the average is 967. H3>Exercise 5.3.4(a) As sets, an element x is in the left-side expression

(R UNION S) UNION T

if and only if it is in at least one of R, S, and T. Likewise, it is in the right-side expression

R UNION (S UNION T)

under exactly the same conditions. Thus, the two expressions have exactly the same members, and the sets are equal.

As bags, an element x is in the left-side expression as many times as the sum of the number of times it is in R, S, and T. The same holds for the right side. Thus, as bags the expressions also have the same value.

Exercise 5.3.4(h)

As sets, element x is in the left side

R UNION (S INTERSECT T)

if and only if x is either in R or in both S and T. Element x is in the right side

(R UNION S) INTERSECT (R UNION T)

if and only if it is in both R UNION S and R UNION T. If x is in R, then it is in both unions. If x is in both S and T, then it is in both union. However, if x is neither in R nor in both of S and T, then it cannot be in both unions. For example, suppose x is not in R and not in S. Then x is not in R UNION S. Thus, the statement of when x is in the right side is exactly the same as when it is in the left side: x is either in R or in both of S and T.

Now, consider the expression for bags. Element x is in the left side the sum of the number of times it is in R plus the smaller of the number of times x is in S and the number of times x is in T. Likewise, the number of times x is in the right side is the smaller of

  1. The sum of the number of times x is in R and in S.
  2. The sum of the number of times x is in R and in T.
A moment's reflection tells us that this minimum is the sum of the number of times x is in R plus the smaller of the number of times x is in S and in T, exactly as for the left side.

Exercise 5.3.5(a)

For sets, we observe that element x is in the left side

(R INTERSECT S) - T

if and only if it is in both R and S and not in T. The same holds for the right side

R INTERSECT (S-T)

so as sets, the equivalence holds.

Now suppose we have bags. Let R = {x}, S = {x,x}, and T = {x}. Then R INTERSECT S = {x}, and the left side is {x}-{x}, or EMPTYSET.

However, on the right, S-T = {x}, so the right side is {x} INTERSECT {x}, which is {x}.

Return to Top

Solutions for Section 5.4

Exercise 5.4.1(a)

{(1,0,1), (5,4,9), (1,0,1), (6,4,16), (7,9,16)}.

Exercise 5.4.1(c)

The result is the list [(0,1), (0,1), (2,3), (2,4), (3,4)].

Exercise 5.4.1(e)

{(0,1), (2,3), (2,4), (3,4)}.

Exercise 5.4.1(g)

{(0,2), (2,7), (3,4)}.

Exercise 5.4.1(k)

A B C
01NULL
234
234
01NULL
24NULL
34NULL

Exercise 5.4.2(a)

When a relation has no duplicate tuples, DELTA has no effect on it. Thus, DELTA is idempotent.

Exercise 5.4.2(b)

The result of PI_L is a relation over the list of attributes L. Projecting this relation onto L has no effect, so PI_L is idempotent.

Exercise 5.4.3

If we use set semantics, it is not hard to get the effect of PI_{A,A} using only the operators of Section 5.1. The idea is to project R onto A, take the product of that relation with itself, and select for equality. As a sequence of steps:
     R1(A) := PI_A(R)
     R2(A,A1) := R1 * RHO_{S(A1)}(R1)
     ANS(A,A1) := SIGMA_{A=A1}(R2)

Under bag semantics, it is not possible. The intuitive idea behind the proof is that:

  1. Focus on the case when R consists only of two tuples, each of which is (0,1). e neec to produce the result {(0,0), (0,0)}.

  2. Since joins can be expressed as products, selections, and projections, let us assume that the only operations used are selection, projection, product, union, intersection, and difference.

  3. We can never get a relation with tuples that have two 0's in different components without using a product. However, then we get at least four identical tuples (or none, if we have eliminated the 0's) with two 0's.

  4. No operation can distinguish among them, so they all survive any operation, or none do.

  5. Thus, we can never produce a result with exactly two tuples with two 0 components.

Return to Top

Solutions for Section 5.5

Exercise 5.5.1(a)

SIGMA_{speed<1000 AND price>1500} (PC) = EMPTYSET

Exercise 5.5.1(d)

This complex expression is best seen as a sequence of steps in which we define temporary relations R1 through R4 that stand for nodes of expression trees. Here is the sequence:
     R1(maker, model, speed) := PROJ_{maker,model,speed}(Product JOIN PC)
     R2(maker, speed) := PROJ_{maker,speed}(Product JOIN Laptop)
     R3(model) := PROJ_{model}(R1 JOIN_{R1.maker=R2.maker AND
     R1.speed<=R2.speed} R2)
     R4(model) := PROJ_{model}(PC)
The constraint is that R4 is a subset of R3.

In explanation, R1 is the set of maker-model-speed triples for PC's, and R2 is the set of maker-speed pairs for laptops. Note JOIN is the natural join. R3 is the set of models (of PC's) for which there is a laptop by the same manufacturer at least as fast. R4 is all the PC models. Note that R3 is surely a subset of R4, so the constraint that R4 be a subset of R3 is really asking that these sets be equal; i.e., every PC has a laptop by the same manufacturer that is at least as fast.

Return to Top