CS346 - Spring 2011
Database System Implementation

RedBase Part 1: The Record Management Component Due Sunday April 10

Late Policy Reminder
Introduction
Header Files
RM Interface
Return Codes and Error Handling
Implementation Suggestions
Documentation
Grading
Testing and Submission
RedBase I/O Efficiency Contest

Late Policy Reminder

Project parts are due at 11:59 PM on the due date. For project parts submitted after 11:59 PM on the due date, there is a 1% penalty applied to that project part score for each hour late. Each student is allocated 2 free days and 12 late hours for the entire course with no penalty. Once the free allowance has been used, penalties begin to be applied. There will be absolutely no exceptions to this late policy. It's crucial that students stay on schedule in this course -- RedBase is a very big project.

Introduction

The first part of the RedBase system you will implement is the Record Management (RM) component. The RM component provides classes and methods for managing files of unordered records. All class names, return codes, constants, etc. in this component should begin with the prefix RM. The RM component is a client to the PF component: your RM methods will make calls to the PF methods we have provided. The PF interface is described in a separate document.

Your RM component will store records in paged files provided by the PF component. To manage file contents conveniently, you will probably want to use the first page of each file as a special header page. This page should contain free space information, as well as whatever other information (related to the file as a whole) you find useful for your implementation. You also must decide exactly how records will be laid out on pages of PF files. Your design task is simplified by the fact that each file will contain a set of records that are all the same size (although record sizes may differ across files). Fixed size records make it easier to manage the records and free space on each page, and fixed size records permit record identifiers within a given file to be a simple combination of page number and record position. More detailed implementation suggestions are given below.

Header Files

Your RM component interface should be specified in an "external" header file, rm.h, which you will later "#include" in RM component clients. To save you some typing, we have created an initial header file that declares the classes and public methods of the RM interface described below. The file is called rm.h, and it is transfered automatically to your code directory when you run the setup script for the first part of the project (see the RedBase Logistics document for details). You must implement all of the methods specified in this interface. If you would like to implement additional functionality in the RM component (either now or later), you are welcome to extend the interface as you please. However, please do not add any new #include statements to rm.h.
The record identifier (RID) class is declared in a separate header file rm_rid.h. This declaration is not included in rm.h since there are some components of RedBase that need the RID definition but no other RM classes. You must implement all of the methods specified in this interface.
For coding convenience and modularity, you may want to create a separate "internal" header file (rm_internal.h, say) containing declarations used only by the RM component. For example, you might choose to put file and page header declarations in this file, since they will not be used by other components.
The fourth relevant header file for this component is a "global" header file, redbase.h. Like rm.h, this file is transfered automatically to your code directory when you run the setup script for the first part of the project. You may want to add declarations to this file as your coding progresses, but please do not add any new #include statements to redbase.h. This header file should contain those declarations that are needed throughout the RedBase system.

RM Interface

The RM interface you will implement consists of five classes: the RM_Manager class, the RM_FileHandle class, the RM_FileScan class, the RM_Record class, and the RID class. In addition, there is an RM_PrintError routine for printing messages associated with nonzero RM return codes. Note that we have not specified copy constructors or overloaded = operators in any of the RM classes (see the PF_Manager class in the PF Component document for some discussion of these methods). You may add these methods if you wish, and we suggest that you use them in any classes that allocate dynamic memory. Note also that we have specified only the public methods of each class.

All RM component public methods except constructors and destructors should return 0 if they complete normally and a nonzero return code otherwise. Return codes and error handling are discussed in detail in a later section of this document.

* RM_Manager Class *

The RM_Manager class handles the creation, deletion, opening, and closing of files of records in the RM component. Your program should create exactly one instance of this class, and all requests for RM component file management should be directed to that instance. Below, the public methods of the class declaration are shown first, followed by descriptions of the methods. The first two methods in the class declaration are the constructor and destructor methods for the class. All necessary initialization of the RM component should take place within the constructor for the RM_Manager class. Note that this constructor takes as a parameter the instance of the PF_Manager class, which you should already have created (see the PF Component document). Any necessary clean-up in the RM component should take place within the destructor for the RM_Manager class. The constructor and destructor methods are not described further for this class.

class RM_Manager {
  public:
       RM_Manager  (PF_Manager &pfm);            // Constructor
       ~RM_Manager ();                           // Destructor
    RC CreateFile  (const char *fileName, int recordSize);  
                                                 // Create a new file
    RC DestroyFile (const char *fileName);       // Destroy a file
    RC OpenFile    (const char *fileName, RM_FileHandle &fileHandle);
                                                 // Open a file
    RC CloseFile   (RM_FileHandle &fileHandle);  // Close a file
 };

RC CreateFile (const char *fileName, int recordSize)

This method should call PF_Manager::CreateFile to create a paged file called fileName. The records in this file will all have size recordSize. This method should initialize the file by storing appropriate information in the header page. Although recordSize will usually be much smaller than the size of a page, you should compare recordSize with PF_PAGE_SIZE and return a nonzero code if recordSize is too large for your RM component to handle.

RC DestroyFile (const char *fileName)

This method should destroy the file whose name is fileName by calling PF_Manager::DestroyFile.

RC OpenFile (const char *fileName, RM_FileHandle &fileHandle)

This method should open the file called fileName by calling PF_Manager::OpenFile. If the method is successful, the fileHandle object should become a "handle" for the open RM component file. The file handle is used to manipulate the records in the file (see the RM_FileHandle class description below). As in the PF component, it should not be an error if a client opens the same RM file more than once, using a different fileHandle object each time. Each call to the OpenFile method should create a new instance of the open file. You may assume if a file has more than one opened instance then each instance of the open file may be read but will not be modified. If a file is modified while opened more than once, you need not guarantee the integrity of the file or the RM component. You may also assume that DestroyFile will never be called on an open file.

RC CloseFile (RM_FileHandle &fileHandle)

This method should close the open file instance referred to by fileHandle by calling PF_Manager:: CloseFile.

* RM_FileHandle Class *

The RM_FileHandle class is used to manipulate the records in an open RM component file. To manipulate the records in a file, a client first creates an instance of this class and passes it to the RM_Manager::OpenFile method described above. Descriptions of the constructor and destructor methods are not included for this class.

class RM_FileHandle {
  public:
       RM_FileHandle  ();                                  // Constructor
       ~RM_FileHandle ();                                  // Destructor
    RC GetRec         (const RID &rid, RM_Record &rec) const;
                                                           // Get a record
    RC InsertRec      (const char *pData, RID &rid);       // Insert a new record,
                                                           //   return record id
    RC DeleteRec      (const RID &rid);                    // Delete a record
    RC UpdateRec      (const RM_Record &rec);              // Update a record
    RC ForcePages     (PageNum pageNum = ALL_PAGES) const; // Write dirty page(s)
                                                           //   to disk
 };

RC GetRec (RID &rid, RM_Record &rec)

For this and the following methods, it should be a (positive) error if the RM_FileHandle object for which the method is called does not refer to an open file. This method should retrieve the record with identifier rid from the file. It should be a (positive) error if rid does not identify an existing record in the file. If the method succeeds, rec should contain a copy of the specified record along with its record identifier (see the RM_Record class description below).

RC InsertRec (char *pData, RID &rid)

This method should insert the data pointed to by pData as a new record in the file. If successful, the return parameter &rid should point to the record identifier of the newly inserted record.

RC DeleteRec (RID &rid)

This method should delete the record with identifier rid from the file. If the page containing the record becomes empty after the deletion, you can choose either to dispose of the page (by calling PF_Manager::DisposePage) or keep the page in the file for use in the future, whichever you feel will be more efficient and/or convenient.

RC UpdateRec (RM_Record &rec)

This method should update the contents of the record in the file that is associated with rec (see the RM_Record class description below). This method should replace the existing contents of the record in the file with the current contents of rec.

RC ForcePages (PageNum pageNum = ALL_PAGES) const

This method should call the corresponding method PF_FileHandle::ForcePages in order to copy the contents of one or all dirty pages of the file from the buffer pool to disk.

* RM_FileScan Class *

The RM_FileScan class provides clients the capability to perform scans over the records of an RM component file, where a scan may be based on a specified condition. As usual, the constructor and destructor methods are not described.

class RM_FileScan {
  public:
       RM_FileScan  ();                                // Constructor
       ~RM_FileScan ();                                // Destructor
    RC OpenScan     (const RM_FileHandle &fileHandle,  // Initialize file scan
                     AttrType      attrType,
                     int           attrLength,
                     int           attrOffset,
                     CompOp        compOp,
                     void          *value,
                     ClientHint    pinHint = NO_HINT);
    RC GetNextRec   (RM_Record &rec);                  // Get next matching record
    RC CloseScan    ();                                // Terminate file scan
 };

RC OpenScan (const RM_FileHandle &fileHandle, AttrType attrType, int attrLength, int attrOffset, CompOp compOp, void *value, ClientHint pinHint = NO_HINT)

This method should initialize a scan over the records in the open file referred to by parameter fileHandle. During the scan, only those records whose specified attribute satisfies the specified condition (a comparison with a value) should be retrieved. If value is a null pointer, then there is no condition and all records are retrieved during the scan. If value is not a null pointer, then value points to the value that attributes are to be compared with.

Parameters attrType and attrLength indicate the type and length of the attribute being compared: either a 4-byte integer, a 4-byte floating point number, or a character string with a length between 1 and MAXSTRINGLEN bytes. (MAXSTRINGLEN = 255 is defined in redbase.h.) Type AttrType is defined in redbase.h as follows: INT for integer, FLOAT for floating point number, and STRING for character string. You will need to cast the value into the appropriate type for the attribute (or, in the case of an integer or float, copy it into a separate variable to avoid alignment problems). If a character string has length n, then the attribute and the value will each be exactly n bytes long. They will not be <= n bytes, i.e., no "padding" is required, and they are not null-terminated. Parameter attrOffset indicates where the attribute is found within the contents of each record. Parameter compOp indicates the way that the record's attribute value should be compared with the value parameter. The different values for compOp are defined in redbase.h as follows:

EQ_OP equal (i.e., attribute = value)
LT_OP less-than (i.e., attribute < value)
GT_OP greater-than (i.e., attribute > value)
LE_OP less-than-or-equal (i.e., attribute <= value)
GE_OP greater-than-or-equal (i.e., attribute >= value)
NE_OP not-equal (i.e., attribute <> value)
NO_OP no comparison (when value is a null pointer)

Parameter pinHint is included so that higher-level RedBase components using an RM component file scan can suggest a specific page-pinning strategy for the RM component to use during the file scan, to achieve maximum efficiency. (See "The Buffer Pool of Pages" section of the PF Component document for a discussion of page pinning.) Type ClientHint is defined in redbase.h, and you will need to define constants in addition to NO_HINT if you plan to use it. You are free to implement only one page-pinning strategy and ignore the pinHint parameter, or you may implement more than one strategy based on pinHint values now, or you may implement one strategy now and consider adding new strategies later when you implement clients of the RM component. Please note that using pinHint is optional, and only default value NO_HINT will be passed to OpenScan in the TA's test suite.

RC GetNextRec (RM_Record &rec)

This method should retrieve a copy of the next record in the file scan that satisfies the scan condition. If this method succeeds, rec should contain a copy of the record along with its record identifier. This method should return RM_EOF (which you should define) if there are no records left satisfying the scan condition. You may assume that RM component clients will not close the corresponding open file instance while a scan is underway.

RC CloseScan ()

This method should terminate the file scan.

* RM_Record Class *

The RM_Record class defines record objects. To materialize a record, a client creates an instance of this class and passes it to one of the RM_FileHandle or RM_FileScan methods that reads a record (described above). RM_Record objects should contain copies of records from the buffer pool, not records in the buffer pool itself.

class RM_Record {
  public:
       RM_Record  ();                     // Constructor
       ~RM_Record ();                     // Destructor

    RC GetData    (char *&pData) const;   // Set pData to point to
                                          //   the record's contents
    RC GetRid     (RID &rid) const;       // Get the record id
 };

RC GetData (char *&pData) const

For this and the following method, it should be a (positive) error if a record has not already been read into the RM_Record object for which the method is called. This method provides access to the contents (data) of the record. If the method succeeds, pData should point to the contents of the copy of the record created by RM_FileHandle::GetRec() or RM_FileScan::GetNextRec().

RC GetRid (RID &rid) const

If this method succeeds, rid should contain the identifier for the record.

* RID Class *

The RID class defines record identifier objects. A record identifier uniquely identifies a record within a given file, based on the record's page number in the file and slot number within that page. You may wonder why this class name does not begin with RM. The RID class actually will be used by several components of RedBase, but for convenience we have introduced it as part of the RM component. (Also, it's used so frequently that it's nice to keep the name short!)

class RID {
  public:
       RID        ();                        // Default constructor
       ~RID       ();                        // Destructor
       RID        (PageNum pageNum, SlotNum slotNum);
                                             // Construct RID from page and
                                             //   slot number
    RC GetPageNum (PageNum &pageNum) const;  // Return page number
    RC GetSlotNum (SlotNum &slotNum) const;  // Return slot number
 };

RID (PageNum pageNum, SlotNum slotNum)

This method should construct a new record identifier object from a given page number and slot number.

RC GetPageNum (PageNum &pageNum)

For this and the following method, it should be a (positive) error if the RID object for which the method is called is not a viable record identifier. (For example, a RID object will not be a viable record identifier if it was created using the default constructor and has not been passed to the RM_Record::GetRid method.) If this method succeeds, it should set pageNum to the record identifier's page number.

RC GetSlotNum (SlotNum &slotNum)

If this method succeeds, it should set slotNum to the record identifier's slot number.

*** RM_PrintError

void RM_PrintError (RC rc);

This routine should write a message associated with the nonzero RM return code rc onto the Unix stderr output stream. This routine has no return value.

Return Codes and Error Handling

In general you should model your return codes and error handling after the PF component. Please reread the relevant information in the PF Component document, and take a look at the files pf.h and redbase.h.

Each component specifies its own sets of nonzero return codes -- a set of positive codes and a set of negative codes. Recall that positive return codes indicate non-error exception conditions, or errors from which the system can recover or exit gracefully; negative return codes indicate errors from which the system cannot recover, or after which the database may be corrupted. You will see as the project progresses that it is very important to separate these types of return codes. Minimum and maximum values for the positive and negative return codes for each component are defined in redbase.h. Each component also provides a routine that takes one of its return codes and prints an appropriate message. In addition, you may choose to have a global PrintError routine that takes any return code and calls the appropriate component's PrintError routine (based on the code's value); this decision depends on your style of error handling and error propagation.

There are two ways errors may occur during execution of an RM component method: because an error was returned from a call to a PF component method, or because of an error or exception condition within the RM component. In either case, the RM method should return a nonzero code to its caller. For handling unexpected return codes from the PF component, we suggest that you simply pass the PF return code along, in which case your program structure should be such that an appropriate message is printed eventually by a global PrintError routine as described above.

          if (rc = pf_obj.Method(...)) {
             // unexpected return code
             return(rc); }

An alternative and more verbose method, which you might consider enabling via a compile-time flag, would be to print the message associated with the PF return code, then provide a different RM return code to the caller:

          if (rc = pf_obj.Method(...)) {
             // unexpected return code
             PF_PrintError(rc);
             return(RM_RETURNCODE); }

In addition to checking that calls to the PF component succeed as expected, it also is very important that you check the incoming parameters to RM methods for validity before doing anything to any file. In general, you should do your best to ensure that a client cannot corrupt RM component files or crash the RM component by passing incorrect arguments to an RM component method. (Hint: we may check this in our grading test suite!)

Implementation Suggestions

Here are some suggestions for your implementation. You are entirely free to use alternative design ideas if you believe they will improve the structure or performance of your code. You must not alter the RM methods specified above, since we plan to test your code using these methods. (In addition, higher-level components are designed with these methods in mind.) However, as mentioned earlier, you are free to extend the interface or alter it later if you find it helpful.

There are a variety of ways to keep track of free record slots on a given page. One efficient method is to use a bitmap: If each data page can hold n records, then you can store an n-bit bitmap in the page header indicating which slots currently contain valid records and which slots are available. Note that the size of the bitmap needed for each page is dependent on the number of records that can be stored on that page, which in turn is dependent on the record size for the file. Hence your bitmaps will be the same size for each page of a given file, but bitmap sizes should be different for different files. One way to implement variable-size bitmaps is as arrays of char's or int's, where the individual bits are manipulated using the operators &, |, and ^.

Note that under no circumstances should you place a limit on the total number of records that can be stored in a file -- each file should be able to grow arbitrarily large. (Admittedly there is an indirect limit because PageNum is defined as an integer, but it should be possible to change the type of PageNum -- say to a long integer -- without changing your code.)

File Scans

You may find it helpful to keep in mind how file scans (as implemented by class RM_FileScan) will be used by higher-level RedBase components. A file scan will be used either to iteratively return all tuples of a relation stored in an RM file, or to return all tuples in the relation that satisfy a given condition. Scans will be used when processing SQL-like Select, Delete, and Update statements, so each record returned will either become part of a query result, will be deleted, or will have a value changed. The insert statement will add a single tuple at a time, so records will not be inserted via a file scan.

Documentation

As described in the RedBase Logistics document, each student is expected to:

Produce a 1-2 page description of your design and testing in a plain text file called rm_DOC.
Include comments in your code.

The first requirement enables the TA to understand how you have designed your program, how you tested it, and any known bugs. Remember, the clearer this description is, the faster the TA will be able to evaluate your code, and the better mood he'll be in while he does it! In the code itself you should include enough comments to delineate and explain the functional units of the code, but you don't need to go overboard.

Please do not duplicate documentation. Your rm_DOC file should describe your overall design, novel features, key data structures, etc., while comments in the code should serve to explain the lower-level mechanical details. Also, as per the Honor Code statement provided in the General Information page, don't forget to cite in rm_DOC any assistance you received (from the instructor, TA, other students in the course, or anyone else) in program design or debugging.

Grading

As a reminder from the RedBase Logistics document, the grade for your program will be based on four aspects:

Functionality - that the program implements the specification and passes the TA's test suite
Documentation - as described above
Design Choices - based on an email message from the TA requesting details on particular aspects of the implementation (see RedBase Logistics)
Modularity/Correctness - usually full credit if your code is reasonably modular and doesn't have gaping inefficiencies

Testing and Submission

For your convenience, we have provided an initial set of tests for your RM component. The tests are in rm_test.cc, retrieved when you run the setup script for the first part of the project. Although these tests exercise some aspects of the RM component, they are not comprehensive. Thus, we strongly encourage you to augment them with a number of additional tests of your own in order to thoroughly exercise your code. During the grading process, we will use a much more comprehensive test suite in order to check your component for correctness.

Submission of the RM component -- along with all other project parts -- will take place electronically. Instructions are in the RedBase Logistics document. Your rm_DOC documentation file should be submitted with your code. The TA will be linking test code to your submitted librm.a, which should be the result of running "make" from within your submission directory. Everything must link correctly if the test program calls methods in rm.h. Also, please ensure that you have not removed the -DPF_STATS flag from the Makefile prior to compiling the code that you submit.

RedBase I/O Efficiency Contest

If you're angling to win the RedBase Efficiency Contest then it is important to consider I/O efficiency in every component, starting with this one. Remember that one I/O is counted each time a page is read into or written out from the buffer pool. There are a number of places where you can save one, two, or even more I/O's by careful coding of RM component methods. Saving a few I/O's now will pay off greatly as you code the higher-level RedBase components.

`EQ_OP`	equal (i.e., attribute = value)
`LT_OP`	less-than (i.e., attribute < value)
`GT_OP`	greater-than (i.e., attribute > value)
`LE_OP`	less-than-or-equal (i.e., attribute <= value)
`GE_OP`	greater-than-or-equal (i.e., attribute >= value)
`NE_OP`	not-equal (i.e., attribute <> value)
`NO_OP`	no comparison (when `value` is a null pointer)

*** RM_Manager Class ***