CS346 Lecture Notes
Indexing and B+ Tree Review,
Overview of IX Component

Primary goal

Quickly find all records (RIDs) in relation R where R.A = v

Some indexes also support:
Quickly find all records with R.A < v, R.A <= v, R.A > v, R.A >= v, v1 < R.A < v2, etc.
Some indexes also support:
Find all records in R sorted by R.A

Question: For what SQL operations are these capabilities useful?

Some terminology:

Primary Index: indexed relation is stored sorted by the indexed attribute
Secondary Index: indexed relation is not stored sorted by the indexed attribute (i.e., any index that is not Primary)
Clustering Index: indexed relation is stored such that tuples for each indexed attribute value are together on page(s), but not necessarily sorted
- Note: Primary => Clustering
- Sometimes the term clustered relation is used when a relation and clustering index on it are merged into one structure
Dense Index: every RID in the indexed relation appears in the index
Sparse Index: some RIDs in the indexed relation don't appear in the index (i.e., any index that is not Dense)
- Note: Sparse => Primary or Clustering, otherwise the index is not usable

Basic picture of a sparse, primary index:

Basic picture of a dense, secondary index:

Slightly different picture of a dense, secondary index:

=> In RedBase you will implement dense, secondary indexes

Two basic index types

Trees and Hashing

Trees Hashing

Instances
.
.

.......................................................................................

.......................................................................................

Search (A=v)
over N records
.

.......................................................................................

.......................................................................................

Insert or delete
record
.

.......................................................................................

.......................................................................................

Search (A<v)
over N records
.

.......................................................................................

.......................................................................................

	Trees	Hashing
Instances . .	.......................................................................................	.......................................................................................
Search (A=v) over N records .	.......................................................................................	.......................................................................................
Insert or delete record .	.......................................................................................	.......................................................................................
Search (A<v) over N records .	.......................................................................................	.......................................................................................

RedBase this year: B+ trees

Possible extensions: linear hashing, extendible hashing, R trees

B+ trees

Variant of B-trees

In B-trees record pointers (RIDs) appear in internal nodes and leaves
In B+ trees record pointers (RIDs) appear only in leaves

Question: What are two advantages of B+ trees over B-trees?

(1)


(2)

Basic structure of a B+ tree:

Each B+ tree has a predefined maximum node capacity N (fanout <= N+1).

Structure of an internal node:

Structure of a leaf node:

Alternate structure for leaf nodes:

Algorithms

Search (A = v)

Search (A < v)

Search (v1 < A < v2)

Search (A > v)

Insert (v, RID)

Delete (v, RID)

Halloween problem

Consider the SQL update command:

   Update R
   Set R.A = R.A + 5
   Where R.A > 10

Suppose a B+ tree is used to find tuples with R.A > 10:

   loop
     fetch next RID from index for R.A > 10
     find record, update R.A
     delete old R.A index entry for RID
     add new R.A index entry for RID
   end loop

Question: What's wrong with this algorithm?

Question: Do we have the same problem with hash indexes?

RedBase IX component

As always, the information given in lecture is a cursory overview and you should read the spec provided in the IX Component document very carefully.

Basic setup of relations and indexes:

Architecture:

IX component does not call RM component
Clients will call both IX and RM
- Example: insert record + insert index entries
- Example: fetch index entry (RID), then fetch record

Interface

Startup: IX_Manager class -- one instance, PF_Manager is parameter

Index routines:

     IX_Manager:: CreateIndex (file, index#, attr-info)
                  DestroyIndex
                  OpenIndex -> initialized index handle
                  CloseIndex

Index entry routines:

     IX_IndexHandle:: InsertEntry (value, RID)
                      DeleteEntry (value, RID)
     Also ForcePages

Index scans:

     IX_IndexScan:: OpenScan (index, comp, value, pin-hint)
                    GetNextEntry -> returns RID
                    CloseScan

Suggested file structure

Deletion

Completely correct deletion (extra credit) is surprisingly difficult.

Alternative 1 (full credit): lazy deletion

Alternative 2 (not quite full credit): tombstones

Question: What global assumption about database behavior makes above schemes okay?

How clients will use IX scans

Selection
Select * From R Where R.A = 10
=> May assume no inserts or deletes during scan
Join
Select * From R,S Where R.A = S.A
=> May assume no inserts or deletes during scan
Update
Update R Set R.A = R.A + 5 Where R.B = 10
=> May prohibit index scans when scanned attribute is also update attribute (Halloween problem and implementation complexity)
=> May assume no inserts or deletes during scan
Delete
Delete From R Where R.A = 10
=> Must work correctly
=> May assume no other scans of any form are underway
=> May assume no inserts or (other) deletes during scan

Suggested implementation order

Search (extend to Scan)
Insert
Delete

CS346 Lecture Notes Indexing and B+ Tree Review, Overview of IX Component