=========================================================================== CS 346 Fall '98 - Prof. Widom - Handout #32 LECTURE NOTES - Query processing lecture #4 ============================================================================ Topic: Some tricks to further speed up query processing Index-only operations --------------------- Compute query answer using indexes only Ex: SELECT price FROM product WHERE price < 25 B+ tree index on price Q: How to execute query without accessing data? Ex: SELECT gpa FROM student WHERE status = 'MS' AND gpa > 3.8 Indexes on status and gpa Q: How to execute query without accessing data? Ex: Same as previous except select name instead of gpa Q: How to execute _minimizing_ access to data? Join processing --------------- Joins are most complex, frequent, studied operation 2-way join problem definition: - R JOIN[p] S - p is "R.A op S.B" - op is =, <, <=, >, >=, <> Note: join predicates other than p are "residual" Standard algorithms: - Nested loop/block - Sort-merge join - Hash join (various flavors) Nested-loop join with key ------------------------- for each t1 in R: for each t2 in S if p(t1,t2) then return combine(t1,t2) Suppose p is R.A=S.B, B is key for S, no relevant indexes Q: How to improve basic algorithm? A: Note: could choose commutative ordering of R JOIN S based on knowledge of key RID-based join -------------- for each in index for R.A use index for R.B to find all such that p(a,b) save in temporary table fetch actual tuples Q: Why is this approach more efficient than nested-loop index join? Q: What happens in special case of the following SELECT clause? SELECT R.A, S.B FROM R, S WHERE R.A op S.B A: Zig-Zag join ------------ B+ tree indexes on R.A and S.B, p is R.A=S.B Repeat until done: for next R.A find all matching S.B for next S.B find all matching R.A Diagram: Q: What is this algorithm very similar to? A: Auxiliary structures -------------------- Store extra information to speed up join Q: What classic trade-off is being made here? A: Join index ---------- Keep table of all RIDs participating in join Diagram: More complex: Keep table of all RIDs participating in join and satisfying certain selection conditions Even more complex: Keep table of all RIDs in result of some arbitrary query Pointer-based joins ------------------- Store RIDs for joining S tuples within R tuples Diagram: Q: What is the disadvantage besides the update cost? A: Q: When can we avoid variable-length records? A: Note: this structure also works well for conditions such as R.A=S.B AND R.A < 10 Bitmap joins ------------ Same as pointer-based except keep bitmap in each tuple identifying joining tuples. Diagram: Ex: SELECT ... FROM sales, store, item WHERE sales.storeID = store.storeID AND sales.itemID = item.itemID AND store.state = "CA" AND item.type = "clothing" Bitmap join indexes on store and item Diagram: (1) Bitmap OR within tables (2) Bitmap AND across tables (3) fetch item tuples Bc tree ------- Combined B+ tree index over different attributes Diagram: Note: contains strictly more information than a join index Q: What queries can it be used for beyond simple joins? A: