===========================================================================
CS 346 Fall '98 - Prof. Widom - Handout #32
LECTURE NOTES - Query processing lecture #4
============================================================================

Topic: Some tricks to further speed up query processing


Index-only operations
---------------------

Compute query answer using indexes only

Ex: SELECT price
    FROM product
    WHERE price < 25

    B+ tree index on price

Q: How to execute query without accessing data?


Ex: SELECT gpa
    FROM student
    WHERE status = 'MS' AND gpa > 3.8

    Indexes on status and gpa

Q: How to execute query without accessing data?


Ex: Same as previous except select name instead of gpa

Q: How to execute _minimizing_ access to data?


Join processing
---------------

Joins are most complex, frequent, studied operation

2-way join problem definition:

- R JOIN[p] S
- p is "R.A op S.B"
- op is =, <, <=, >, >=, <>

Note: join predicates other than p are "residual"

Standard algorithms:

- Nested loop/block
- Sort-merge join
- Hash join (various flavors)


Nested-loop join with key
-------------------------

for each t1 in R:
  for each t2 in S
    if p(t1,t2) then return combine(t1,t2)

Suppose p is R.A=S.B, B is key for S, no relevant indexes

Q: How to improve basic algorithm?

A:


Note: could choose commutative ordering of R JOIN S based on knowledge
      of key

RID-based join
--------------

for each <a,rid1> in index for R.A
  use index for R.B to find all <b,rid2> such that p(a,b)
    save <rid1,rid2> in temporary table
fetch actual tuples

Q: Why is this approach more efficient than nested-loop index join?


Q: What happens in special case of the following SELECT clause?
       
   SELECT R.A, S.B
   FROM R, S
   WHERE R.A op S.B

A:


Zig-Zag join
------------

B+ tree indexes on R.A and S.B, p is R.A=S.B

Repeat until done:
  for next R.A find all matching S.B
  for next S.B find all matching R.A

Diagram:


Q: What is this algorithm very similar to?

A:


Auxiliary structures
--------------------

Store extra information to speed up join

Q: What classic trade-off is being made here?

A:


Join index
----------

Keep table of all RIDs participating in join

Diagram:


More complex: Keep table of all RIDs participating in join and
satisfying certain selection conditions

Even more complex: Keep table of all RIDs in result of some arbitrary
query


Pointer-based joins
-------------------

Store RIDs for joining S tuples within R tuples

Diagram:


Q: What is the disadvantage besides the update cost?

A:


Q: When can we avoid variable-length records?

A:


Note: this structure also works well for conditions such as
      R.A=S.B AND R.A < 10

Bitmap joins
------------

Same as pointer-based except keep bitmap in each tuple identifying
joining tuples.

Diagram:


Ex: SELECT ...
    FROM  sales, store, item
    WHERE sales.storeID = store.storeID
    AND   sales.itemID = item.itemID
    AND   store.state = "CA"
    AND   item.type = "clothing"

Bitmap join indexes on store and item

Diagram:


(1) Bitmap OR within tables
(2) Bitmap AND across tables
(3) fetch item tuples

Bc tree
-------

Combined B+ tree index over different attributes

Diagram:


Note: contains strictly more information than a join index

Q: What queries can it be used for beyond simple joins?

A: