CS346 Lecture Notes
Indexing and B+ Tree Review,
Overview of IX Component


Primary goal

Quickly find all records (RIDs) in relation R where R.A = v Question: For what SQL operations are these capabilities useful?


Some terminology: Basic picture of a sparse, primary index:


















Basic picture of a dense, secondary index:












Slightly different picture of a dense, secondary index:












=> In RedBase you will implement dense, secondary indexes

Two basic index types

Trees and Hashing

Trees Hashing

Instances
.
.



.......................................................................................



.......................................................................................

Search (A=v)
over N records

.



.......................................................................................



.......................................................................................

Insert or delete
record

.



.......................................................................................



.......................................................................................

Search (A<v)
over N records

.



.......................................................................................



.......................................................................................

  • RedBase this year: B+ trees
  • Possible extensions: linear hashing, extendible hashing, R trees


    B+ trees

    Variant of B-trees Question: What are two advantages of B+ trees over B-trees?
    (1)
    
    
    (2)
    
    
    
    Basic structure of a B+ tree:
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Each B+ tree has a predefined maximum node capacity N (fanout <= N+1).

    Structure of an internal node:

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Structure of a leaf node:
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Alternate structure for leaf nodes:
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    Algorithms

    Search (A = v)
    
    
    
    Search (A < v)
    
    
    
    Search (v1 < A < v2)
    
    
    
    Search (A > v)
    
    
    
    Insert (v, RID)
    
    
    
    
    Delete (v, RID)
    
    
    
    
    
    
    

    Halloween problem

    Consider the SQL update command:
       Update R
       Set R.A = R.A + 5
       Where R.A > 10
    
    Suppose a B+ tree is used to find tuples with R.A > 10:
       loop
         fetch next RID from index for R.A > 10
         find record, update R.A
         delete old R.A index entry for RID
         add new R.A index entry for RID
       end loop
    
    Question: What's wrong with this algorithm?
    
    
    
    Question: Do we have the same problem with hash indexes?
    
    
    
    

    RedBase IX component

    As always, the information given in lecture is a cursory overview and you should read the spec provided in the IX Component document very carefully.

    Basic setup of relations and indexes:

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Architecture:
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    Interface

    1. Startup: IX_Manager class -- one instance, PF_Manager is parameter

    2. Index routines:
           IX_Manager:: CreateIndex (file, index#, attr-info)
                        DestroyIndex
                        OpenIndex -> initialized index handle
                        CloseIndex
      
    3. Index entry routines:
           IX_IndexHandle:: InsertEntry (value, RID)
                            DeleteEntry (value, RID)
           Also ForcePages
      
    4. Index scans:
           IX_IndexScan:: OpenScan (index, comp, value, pin-hint)
                          GetNextEntry -> returns RID
                          CloseScan
      

    Suggested file structure

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    Deletion

    Completely correct deletion (extra credit) is surprisingly difficult.

    Alternative 1 (full credit): lazy deletion

    
    
    
    
    
    
    
    
    

    Alternative 2 (not quite full credit): tombstones

    
    
    
    
    
    
    
    
    
    Question: What global assumption about database behavior makes above schemes okay?
    
    
    

    How clients will use IX scans

    1. Selection
      Select * From R Where R.A = 10
      
      
      
      
      
      
      
      
      
      => May assume no inserts or deletes during scan

    2. Join
      Select * From R,S Where R.A = S.A
      
      
      
      
      
      
      
      
      
      
      
      
      => May assume no inserts or deletes during scan

    3. Update
      Update R Set R.A = R.A + 5 Where R.B = 10
      
      
      
      
      
      
      
      
      
      
      
      
      => May prohibit index scans when scanned attribute is also update attribute (Halloween problem and implementation complexity)
      => May assume no inserts or deletes during scan

    4. Delete
      Delete From R Where R.A = 10
      
      
      
      
      
      
      
      
      
      
      
      
      => Must work correctly
      => May assume no other scans of any form are underway
      => May assume no inserts or (other) deletes during scan

    Suggested implementation order

    1. Search (extend to Scan)
    2. Insert
    3. Delete