CS346 - Spring 2007
Database System Implementation

RedBase Project
Project FAQ (look here first!)

Project PartHandoutDue Date
Paged File Component PF Specification supplied
Record Management Component RM Specification Sunday April 15
Indexing Component IX Specification Sunday April 29
System Management Component SM Specification Sunday May 6
Query Language Component QL Specification Sunday May 27
Personal Extension EX Specification Proposal due Mon. May 21
Demos Mon.-Wed. June 11-13

Supporting Documents
Logistics: Setting Up, Testing, Submission Process, and Grading
Using Purify
Policy on memory use
RedBase Statistics Tracker (optional)

Project Overview
The focal point of the course is the RedBase project. RedBase stands for Relational Database, and also alludes to Stanford's color. (We know, Stanford's color is really Cardinal, but CardBase doesn't have as much of a ring to it.) RedBase is a complete single-user relational database management system. It involves a significant amount of coding, and the project must be completed by each individual student -- teams are not permitted. The project is highly structured, but there is enough slack in the specification so that creativity is both allowed and required. The basic project is divided into four parts:

  1. The Record Management (RM) Component: In this part you will implement a set of functions for managing unordered files of database records. This component will rely on a Paged File (PF) component that we will provide. The Paged File component performs low-level file I/O at the granularity of pages.

  2. The Indexing (IX) Component: In this part you will implement a facility for building indexes on records stored in unordered files. Your indexing facility will be based on B+ trees. The Indexing component will rely on the Paged File component.

  3. The System Management (SM) Component: In this part you will implement various database and system utilities, including data definition commands and catalog management. The System Management component will rely on the Record Management and Indexing components from Parts 1 and 2. It also will use a command-line parser, which we will provide.

  4. The Query Language (QL) Component: In this part you will implement RQL -- the RedBase Query Language. RQL consists of user-level data manipulation commands, both queries and updates. The Query Language component will rely on the three components from Parts 1-3, and it will use the command-line parser that we are providing.
In addition to the basic project, each student will design and implement a significant extension to RedBase. We expect that students will get ideas about extensions as the course progresses. Possibilities include aspects of record management, long fields (BLOBs), object management, text management, sorting, indexing, join algorithms, clustering, statistics and query optimization, query language extensions, OLAP, XML, concurrency control, recovery, security and authorization, compression, networking, versioning, external functions, stored procedures, views, integrity constraints, triggers, user and application interfaces, web integration, etc. (We're certainly open to additional suggestions.) Each student will submit a proposal for their project extension. Students will get feedback on their proposal, then will implement their extension as the fifth and final part of the project. Complete projects will be demonstrated to the instructor and TA during finals week.

Project Help Sessions
There will be a help session conducted by the TA at 7:00 PM each Thursday evening preceding the Sunday due-date for Parts 1-4 of the basic RedBase project, and an extra help session the week that Part 4 is assigned. The purpose of the help sessions is to discuss design decisions for each project part and to answer commonly-asked programming-related questions in a group setting. The help sessions are not required, and good students should have no trouble completing the project without attending the help sessions.

RedBase I/O Efficiency Contest
As the old saying goes, the three most important aspects of a database management system are efficiency, efficiency, and efficiency. To encourage you to take efficiency into consideration as you develop your RedBase system, we will be conducting a RedBase Efficiency Contest when the QL component is complete. While there are several important efficiency measures in a DBMS, we will focus on I/O performance. We will measure each student's RedBase system on a set of benchmark queries and updates in the RQL language and will count the number of I/O's -- the fewer the better, of course. All students enter the contest automatically when they submit their QL component, unless they prefer to be excluded. The prizes are:

Late Policy
The late policy follows. There will be absolutely no exceptions to this late policy, so please don't even ask! It's crucial that students stay on schedule in this course -- RedBase is a very big project.

Computer Accounts
You will implement RedBase using the Unix workstations on the second floor of Sweet Hall (the saga's, elaine's, myth's, etc.). To open an account on these machines if you do not already have a leland ID, type open at the "login:" prompt, or telnet to open.stanford.edu and use login name open, then follow the instructions. Directory /usr/class/cs346 will contain files and subdirectories for the class.

Students with access to their own workstations or Linux PCs are welcome to try to use them, but you will need to copy all provided software from the Stanford workstations. While we will do our best to ensure that the code we provide is portable, we cannot guarantee portability across all platforms. Likewise, while the TA may attempt to help with platform-specific problems, our focus will be on the Sweet Hall workstations.

Your programs will be submitted electronically and they will be tested by the TA on a Sweet Hall Sun workstation. It will be your responsibility to ensure that your programs compile and run correctly on that platform before submitting them.

More on Programming
We will provide code for the Paged File (PF) component of RedBase and for some commonly-used routines in other components. We will also provide a command-line parser that you will use for Parts 3 and 4 of the project. Specifications for the code we provide, along with specifications for each component that you will implement, will be given as object-oriented interfaces in the C++ programming language. We will help you get started with your programming by providing sample Makefiles, header files, etc. In addition, for some of the project parts we will provide test suites in advance of the due date, although these tests will not be comprehensive.