CS346 - Spring 2011
Database System Implementation

RedBase Part 3: The System Management Component
Due Sunday May 1

Introduction
The third part of the RedBase system you will implement is the System Management (SM) component. This component provides a "potpourri" of capabilities:

  1. Unix command line utilities - for creating and destroying RedBase databases, invoking the system

  2. Data definition language (DDL) commands - for creating and dropping relations, creating and dropping indexes

  3. System utilities - for bulk loading, help, printing relations, setting parameters

  4. Metadata management - for maintaining system catalogs
Since command line utilities are invoked at the Unix level, you will write stand-alone programs for each of them. DDL commands and system utilities are invoked from a RedBase prompt. You don't need to worry about managing the prompt, user interaction, or command parsing -- we provide a parser that handles user interaction and recognizes all RedBase commands. The parser invokes methods implemented by your SM and later QL (Part 4) components.
More details on the parser are given below. Metadata management is internal to the system.

Throughout this document we assume (and strongly suggest) a design in which each RedBase relation is stored in its own RM component file, and each tuple of a RedBase relation is stored as a record in the appropriate file. Consequently, "files" are now sometimes referred to as "relations," and "records" are now referred to as "tuples."

Command Line Utilities
You should implement the following three utilities:

Separate stand-alone programs should be written for each of these utilities. You should create executable files and name them as dbcreate, dbdestroy, and redbase, so that the utilities can be invoked from the Unix command prompt. We are providing program "shells" for two of these commands, which you can use as starting points for your implementation. They are located in redbase.cc and dbcreate.cc, obtained when you run the setup script with argument "3" (for project part 3) -- there's more on setup for this project part described below.

DBname is a user-specified name associated with a particular database. Your RedBase system should be able to manage multiple databases, although it is invoked (using the redbase command) for only one database at a time. All of the files associated with a given database DBname -- files containing relations, indexes, and metadata -- should be stored in a subdirectory for that database; the subdirectory can be called DBname. Command dbcreate should set up the subdirectory. Command dbdestroy should destroy the subdirectory and all of its contents. One of the first things command redbase should do is change the current working directory to the subdirectory for the specified DBname.

The "dbcreate" Command

The dbcreate command is invoked with the name of the new database. The following approximate code segment implements this command:
   main (
    int argc,      // length of array argv
    char **argv )  // argv is an array of pointers
                   // argv[0] points to the name of the command
                   // argv[1] points to argument DBname
   {
      char *dbname;
      char command[80] = "mkdir ";

      if (argc != 2) {
         cerr << "Usage: " << argv[0] << " dbname \n";
         exit(1);
      }

      // The database name is the second argument
      dbname = argv[1];

      // Create a subdirectory for the database
      system (strcat(command,dbname));

      if (chdir(dbname) < 0) {
         cerr << argv[0] << " chdir error to " << dbname << "\n";
         exit(1);
      }

      // Create the system catalogs
      ...
   }

The "dbdestroy" Command

The dbdestroy command is invoked with the name of the database to be destroyed. The code for dbdestroy is similar in spirit to the code for dbcreate. In dbdestroy, the following system call should be used:
   char command[80] = "rm -r ";
   system(strcat(command, dbname));
You don't need to worry about individually removing each file for DBname -- all files in the subdirectory will be removed when you invoke rm with the -r option.

The "redbase" Command

The redbase command is invoked with the name of the database to be used for the RedBase session. This command will now be the main driver of your RedBase system. The following approximate (and incomplete) code segment is used in the implementation of this command:
   main(...)
    {
      ...
      // initialize RedBase components
      PF_Manager pfm;
      RM_Manager rmm(pfm);
      IX_Manager ixm(pfm);
      SM_Manager smm(ixm, rmm);
      QL_Manager qlm(smm, ixm, rmm);
      // open the database
      if (rc = smm.OpenDb(dbname)) ...
      // call the parser
      RBparse(pfm, smm, qlm);
      // close the database
      if (rc = smm.CloseDb()) ...
    }
Methods SM_Manager::OpenDb and SM_Manager::CloseDb are described in the SM interface below. We are providing stub files defining the QL_Manager class (more on this later). You do not need to write routine RBparse -- it is provided by the parser (see parse.y if you're interested). When RBparse is called, a loop repeatedly prompts the user for a command and then calls appropriate SM or QL methods to process the command. The command loop terminates and the call to RBparse returns when the user types "exit;" at the parser prompt. (Termination also occurs if a negative error code is returned from an SM or QL method call, or if an end-of-file character is generated accidentally.)

RedBase System Commands
Once the RedBase system has been started up, the user submits commands that are classified into data definition language (DDL) commands, system utility commands, and data manipulation language (DML) commands. You will implement DDL commands and system utilities in this component. DML commands will be implemented in Part 4 of the project. All RedBase command names are case-insensitive, and all commands are terminated by a semicolon. (It's a common mistake among RedBase users to forget to type the semicolon.)

To describe the DDL and system utility commands, we'll first specify the syntax of the commands and explain what they should do from a user's perspective. Then we'll specify the SM component interface, which includes the methods that support the commands, as well as methods OpenDb and CloseDb.

DDL Commands

  1. create table relName(attrName1 Type1, attrName2 Type2, ..., attrNameN TypeN);

    The create table command creates a relation with the specified name and schema. Relation and attribute names are limited to MAXNAME = 24 characters each, and must begin with a letter. Within a given database every relation name must be unique, and within a given relation every attribute name must be unique. (However, relation names may be duplicated across databases, and attribute names may be duplicated across relations.) Every relation must have at least one and no more than MAXATTRS = 40 attributes. The syntax of each Type argument is a one-character type specification -- "i", "f", or "c", for integer, float, or character string, respectively -- followed by an integer length. For "i" and "f" the only valid length is 4; for "c" valid lengths range from 1 to MAXSTRINGLEN = 255. As an example, "c50" is the type specification for an attribute whose values are character strings of length 50. Note that when the type specification "cN" is given, the attribute must be able to store strings of length N -- the system should not use one of the characters for null termination. The constants MAXNAME, MAXATTRS, and MAXSTRINGLEN are defined in redbase.h.

  2. drop table relName;

    The drop table command destroys relation relName, along with all indexes on the relation.

  3. create index relName(attrName);

    The create index command creates an index on attribute attrName of relation relName and builds the index for the current tuples in the relation. Only one index may be created for each attribute of a relation.

  4. drop index relName(attrName);

    The drop index command destroys the index on attribute attrName of relation relName.

System Utilities

System utility commands are invoked from the RedBase prompt, just like DDL (and later DML) commands.
  1. load relName("FileName");

    The load utility performs bulk loading of the named relation from the specified Unix file: all tuples specified in the load file are inserted into the relation. The FileName should be a complete Unix path enclosed in quotes (e.g., "/usr/class/cs346/redbase/data/student.data"). This file holds the data to be loaded into the relation in ASCII format. Every tuple to be inserted is on a separate line in the file, with attribute values separated by commas and appearing in the same order as in the create table command that was executed for relation relName.

    Integer attribute values are specified in the ASCII load file as, e.g., 10 or -5, float values are specified as, e.g., 3.5E-3, and you may assume these values are in the right format. Character string values are specified as, e.g., Smith (without quotes). You may assume that character string attribute values will not contain commas. Character strings in the load file can be of any length up to the length specified for the corresponding attribute, including zero length (no characters for that field in the load file). If a character string is too long, you may silently truncate it, or you may generate a nonzero return code and stop loading, whichever behavior you prefer.

    Some example load files can be found in directory /usr/class/cs346/redbase/data/. We strongly suggest that you create your own additional data files for loading, as well as for future testing and experimentation. If you would like to share your data files with the class, you may place them in directory /usr/class/cs346/redbase/data/, as discussed in the RedBase Logistics document.

  2. help [relName];

    The square brackets here are not part of the command syntax -- they indicate that relName is optional. If a relName is not specified, then the help utility prints the names of all relations in the database. (You may include additional information if you like.) If a relName is specified, then the help utility prints the name, type, length, and offset of each attribute in the specified relation, together with any other information you feel may be useful.

  3. print relName;

    The print utility displays the current set of tuples in relation relName.

  4. set Param = "Value";

    The set utility allows the user to set system parameters without needing to recompile the system, or even exit the RedBase prompt. You should determine if there are any parameters you might find useful to control in this manner. (Examples might be level of tracing information produced, debugging flags, etc.) You are not required to implement set for any particular parameters; we're just providing the necessary "hooks" in case you find it convenient to do so.

SM Interface
When the parser recognizes a DDL or system utility command, it calls a public method of the SM_Manager class, described below. SM_Manager is the only class in the SM interface. The SM interface also includes an SM_PrintError routine for printing messages associated with nonzero SM return codes. An initial header file for the SM interface is in sm.h, obtained via the setup script. An additional SM "stub" file contains a shell of all of the public methods of the SM class. You should rename sm_stub.cc as sm_manager.cc and complete the implementation. As usual, all SM component public methods (except constructors and destructors) should return 0 if they complete normally and a nonzero return code otherwise.
Parser handling of nonzero return codes is discussed below.

SM_Manager Class


   // Used by SM_Manager::CreateTable
   struct AttrInfo {
      char     *attrName;           // Attribute name
      AttrType attrType;            // Type of attribute
      int      attrLength;          // Length of attribute
   };

   // Used by Printer class
   struct DataAttrInfo {
      char     relName[MAXNAME+1];  // Relation name
      char     attrName[MAXNAME+1]; // Attribute name
      int      offset;              // Offset of attribute 
      AttrType attrType;            // Type of attribute 
      int      attrLength;          // Length of attribute
      int      indexNo;             // Attribute index number
   };


   class SM_Manager {
     public:
          SM_Manager  (IX_Manager &ixm, RM_Manager &rmm);  // Constructor
          ~SM_Manager ();                                  // Destructor
       RC OpenDb      (const char *dbName);                // Open database
       RC CloseDb     ();                                  // Close database
       RC CreateTable (const char *relName,                // Create relation
                       int        attrCount,
                       AttrInfo   *attributes);
       RC DropTable   (const char *relName);               // Destroy relation
       RC CreateIndex (const char *relName,                // Create index
                       const char *attrName);
       RC DropIndex   (const char *relName,                // Destroy index
                       const char *attrName);
       RC Load        (const char *relName,                // Load utility
                       const char *fileName);
       RC Help        ();                                  // Help for database
       RC Help        (const char *relName);               // Help for relation
       RC Print       (const char *relName);               // Print relation
       RC Set         (const char *paramName,              // Set system parameter
                       const char *value);
   };

RC OpenDb (const char *dbName)

Recall that this method, along with method CloseDb, is called by your code implementing the redbase command line utility. This method should change to the directory for the database named *dbName (using system call chdir), then open the files containing the system catalogs for the database. System catalogs are described below.

RC CloseDb ()

This method should close all open files in the current database. Closing the files will automatically cause all relevant buffers to be flushed to disk.

RC CreateTable (const char *relName, int attrCount, AttrInfo *attributes)

This method creates a new relation named *relName. Argument attrCount indicates the number of attributes in the relation (between 1 and MAXATTRS). Argument attributes is an array of length attrCount. For the ith attribute of the new relation, the ith element of array attributes contains the name, type, and length of the attribute (see definition of AttrInfo above).

This method should first update the system catalogs: a tuple for the new relation should be added to a catalog relation called relcat, and a tuple for each attribute should be added to a catalog relation called attrcat. (Catalogs relcat and attrcat describe all relations and attributes in the database, respectively; details are given below.) To create entries for the new relation in the catalogs, you will need to calculate tuple length and attribute offset information from the arguments passed to this method. After updating the catalogs, method RM_Manager::CreateFile should be called to create a file that will hold the tuples of the new relation.

RC DropTable (const char *relName)

This method should destroy the relation named *relName and all indexes on that relation. The indexes are found by accessing catalog attrcat, and the index files are destroyed by calling method IX_Manager::DestroyIndex. The file for the relation itself is destroyed by calling method RM_Manager::DestroyFile. Information about the destroyed relation should be deleted from catalogs relcat and attrcat.

RC CreateIndex (const char *relName, const char *attrName)

This method should create an index on attribute *attrName of relation *relName and build the index from the current contents of the relation. This method should first check (by accessing catalog attrcat) that there is not already an index on the specified attribute; if there is, a nonzero code should be returned. Catalog attrcat should be updated to reflect the new index, then method IX_Manager::CreateIndex should be called to create the index. Building the index consists of: (1) opening the index; (2) using RM component methods to scan through the records to be indexed, repeatedly calling IX_IndexHandle::InsertEntry; (3) closing the index.

RC DropIndex (const char *relName, const char *attrName)

This method should destroy the index on attribute *attrName of relation *relName. This method should first check (by accessing catalog attrcat) that the index exists; if not, a nonzero code should be returned. Catalog attrcat should be updated, and method IX_Manager::DestroyIndex should be called to destroy the index.

RC Load (const char *relName, const char *fileName)

This method should insert into the relation named *relName all tuples in the Unix ASCII file named *fileName. The tuples are formatted as described earlier. Method RM_Manager::OpenFile is called to open the relation file and method IX_Manager::OpenIndex is called to open each index. Then the tuples are read from the ASCII file one at a time, using schema information obtained from catalog attrcat. After reading each tuple, method RM_FileHandle::InsertRec is called to insert the tuple into the relation, and method IX_IndexHandle::InsertEntry is called for each index to make appropriate index entries for the tuple. If an error occurs during loading, this method should just return a nonzero code -- do not worry about leaving relations or indexes in a "half-loaded" state. After all the tuples are loaded, all opened files, including the ASCII load file, should be closed. Note that the ASCII load file is manipulated directly, not using the RedBase file interface.

RC Help ()

If Help is called with no arguments, then a list of all relations should be obtained and printed by scanning catalog relcat. For uniformity, this method must do all of its printing using the Printer class that we are providing, described below.

RC Help (const char *relName)

If Help is called with argument *relName, then information about the attributes in the named relation should be obtained and printed by accessing catalog attrcat. For uniformity, this method must do all of its printing using the Printer class that we are providing, described below.

RC Print (const char *relName)

This method should open the relation named *relName, print its contents by scanning the entire relation, print the total number of tuples returned, then close the relation. For uniformity, this method must do all of its printing using the Printer class that we are providing, described below.

RC Set (const char *paramName, const char *value)

This method is called when the user requests to set a system parameter, as described earlier. This method should set the parameter identified by argument paramName to the value specified by argument *value. Regardless of the type of the system parameter (integer, character, etc.), the value is passed to this method as a string, exactly as typed by the user (without the quotes).

SM_PrintError

void SM_PrintError (RC rc);

This routine should write a message associated with the nonzero SM return code rc onto the Unix stderr output stream. This routine has no return value.

Return Codes and Error Handling
Return codes and error handling will generally continue in the style you adopted for Parts 1 and 2 of the project. If the parser receives a nonzero return code after calling one of your SM component methods, it will use the value of the return code to call the appropriate component's PrintError routine. If the code is positive, the parser will then resume with the command loop under the assumption that the database is intact and processing can continue. If the code is negative, the parser will terminate the command loop.

Note: You should not call SM_PrintError directly from within the SM (or any other) component. SM_PrintError is called automatically by the parser when it receives a nonzero return code from an SM component method it invokes. You may, however, need to call SM_PrintError from your stand-alone programs implementing the command line utilities dbcreate, dbdestroy, and redbase.

Metadata Management
The internal schema information that describes the database relations, the format of their attributes, and which attributes have indexes, is stored in special system-managed relations called catalogs. Like all relations, files for the catalog relations can and should be manipulated using the methods you implemented in Parts 1 and 2.

For the basic project you will need only two catalogs: relcat and attrcat. Please do not change the names of these catalogs. These two catalog relations should be created when the "dbcreate DBname" command is executed, and they should be stored in the subdirectory called DBname (each database has its own catalogs). Since you will create and manipulate these relations using RM and perhaps IX component methods, the stand-alone code you write for the dbcreate command must be linked with your RM and IX components.

The relcat relation is used to keep track of all relations in the database. There is one tuple in relcat for each relation. Examples of the kind of information you might choose to include in each tuple of the relcat relation are:

relName relation name
tupleLength tuple length in bytes
attrCount number of attributes
indexCount number of indexed attributes

The actual information in relcat may vary from design to design -- you should include all information that you find useful.

The attrcat relation is used to keep track of all attributes of all relations in the database. There is one tuple in attrcat for each attribute of each relation. Examples of the kind of information you might choose to include in each tuple of the attrcat relation are:

relName this attribute's relation
attrName attribute name
offset offset in bytes from beginning of tuple
attrType attribute type
attrLenth attribute length
indexNo index number, or -1 if not indexed

Again, the actual information in attrcat may vary from design to design, and you should include all information that you find useful.

You should enable the user to access the catalog relations using standard commands, as if the catalogs were regular database relations. To do so, you will need to insert descriptions for relcat and attrcat into relcat and attrcat when you create them. Users should not, however, be permitted to load tuples into or drop either of the system catalogs.

Because the system catalogs are accessed very frequently, we suggest that you open the catalogs for a database when the database is opened, then keep the catalogs open until the database is closed. One effect of doing so is that updates to the catalogs may not be reflected onto disk immediately. Thus, if you open a catalog a second time (to implement the help utility, for example, or to print the contents of a catalog), then you may not see the most current version of the catalog. One solution to this problem is to call RM_FileHandle::ForcePages each time a catalog is changed -- don't forget to also flush any header information, and to call IX_FileHandle::ForcePages for any indexes you've created on the catalog.

Metadata Methods

You will almost certainly find it convenient for this component and for Part 4 to implement methods that obtain metadata information from catalogs relcat and/or attrcat for a specific relation or attribute. We have not included these methods in our interface for SM_Manager since the type of information returned may be dependent on how you organize your metadata. However, we do highly recommend that you define and implement such methods now, both for clean coding in the SM component and to save yourself extra work when you attack the QL component.

Setup and Files
Before beginning your work on the SM component you should run the setup script with argument "3" (for project part 3). The script will copy or link more than 20 files. Most of these files are for the parser, but we are also including "stubs" for the SM and QL components. The first thing you should do after running setup is to uncomment SM_SOURCES, QL_SOURCES, UTIL_SOURCES, and PARSER_SOURCES in Makefile to build the parser (libparser.a) and the stubs for the SM and QL components. To implement the SM component you will be replacing the SM stub and its temporary library (libsm.a) with your own. You will continue to use the QL stub (and libql.a) until project part 4. The QL stub is necessary so that executables involving the parser link properly. The files provided also include printer.h and printer.cc, which contain the Printer class
described below.

The Parser
The RedBase parser takes a command from the user, parses it, then calls an SM or QL component method to execute the command. After command execution is done, the parser prompts the user for a new command. For convenience, the parser also accepts Unix shell commands preceded by an exclamation mark ("!ls", for example), so you don't need to exit the RedBase system in order to do things such as see what your directory looks like. To exit the parser's command loop, type "exit;" at the prompt. parser_test allows you to experiment with RedBase prompt interaction and see in general how the parser operates. Remember that your redbase.cc needs to call function RBparse, so you must include parser.h in redbase.cc. Finally, library libql.a is necessary for any linking step that includes the parser.

The parser is implemented using flex and yacc. If you would like to modify the parser, the file called Parser.HowTo explains how to get started.

The Printer Class
We are providing a class for printing tuples of a relation one at a time. This class must be used by your SM_Manager::Help and SM_Manager::Print methods, and it will be required for some QL component methods as well.

   class Printer {
     public: 
       Printer(const DataAttrInfo *attributes, const int attrCount);
       ~Printer();         
       void PrintHeader(ostream &c) const;
       Void Print(ostream &c, const char * const data);
       void Print(ostream &c, const void * const data[]);
       void PrintFooter(ostream &c) const;
   };

Printer(const DataAttrInfo *attributes, const int attrCount);

The constructor takes a pointer to attrCount elements of DataAttrInfo, describing the schema of the tuples to be printed. The DataAttrInfo is used to display the header information and the tuples in a nice fashion.

void PrintHeader(ostream &c) const;

This method prints header information. Each attribute name is printed, along with its relation name if needed to disambiguate the attribute. A row of dashes is then printed. This method should be called once, before printing any actual tuples.

void Print(ostream &c, const char * const data);
void Print(ostream &c, const void * const data[]);

This method prints the tuple that is pointed to by char* data. For the QL component, this method also can receive an array of void* pointers, where each one points to an attribute value. (The utility of the second version will become evident when implementing the QL component.)

void PrintFooter(ostream &c) const;

This method prints "footer" information about the number of tuples printed. This method should be called once, after the last tuple has been printed.

As an example of how the Printer class is used, the following code segment is similar to what you will write in your SM_Manager::Print method. Note that this code is simplified, and you will need to fill in the details.

   DataAttrInfo *attributes;
   int attrCount;
   RM_FileHandle rfh;
   RM_Record rec;
   char *data;

   // Fill in the attributes structure, define the RM_FileHandle
   ...

   // Instantiate a Printer object and print the header information
   Printer p(attributes, attrCount);
   p.PrintHeader(cout);

   // Open the file and set up the file scan
   if ((rc=rmm->OpenFile(relName, rfh)))
      return(rc);

   RM_FileScan rfs;

   if ((rc = rfs.OpenScan(rfh, INT, sizeof(int), 0, NO_OP, NULL))) 
      return (rc);

   // Print each tuple
   while (rc!=RM_EOF) {
      rc = rfs.GetNextRec(rec);

      if (rc!=0 && rc!=RM_EOF)
         return (rc);

      if (rc!=RM_EOF) {
         rec.GetData(data);
       p.Print(cout, data);
      }
   }

   // Print the footer information
   p.PrintFooter(cout);

   // Close the scan, file, delete the attributes pointer, etc.
   ...

Additional Commands
In addition to the DDL commands and system utilities described earlier, and the DML commands you will implement in the QL component, we have exposed some administrative commands in the parser. You will find these commands particularly useful for testing and tuning your system's I/O performance, i.e., to gear up for the efficiency contest which is now only two components away. The commands are:

reset buffer; Remove all unpinned pages from the buffer pool
print buffer; Display simple information about pages in the buffer
resize buffer i; Resize the buffer pool to i pages (i an integer)
print io; Display the I/O statistics
reset io; Reset the I/O statistics

Documentation, Testing, Submission, Etc.
As usual: you are expected to include comments in your code; you should submit a 1-2 page description (in a plain text file sm_DOC) covering your design, key data structures, testing strategy, and known bugs, and citing any assistance you received; you will be asked some design questions by email. Part 3 will be submitted via the submit script in the same way you submitted Parts 1 and 2. As always, please compile using the -DPF_STATS flag, remember to run the "submit -c" script with argument "3" (for project part 3), and be sure to check the file submit.sm before issuing the final "submit -s 3" command. Your submission should include executables for dbcreate, dbdestrory, and redbase, which should be the result of running "make" from within your submission directory.

Because the SM component enables you to work with the interactive RedBase prompt and submit user-level commands, a separate test program is not required. However, you will need to create ASCII data files in order to bulk-load your relations, and you may find it convenient to use scripts for testing. There are some data files located in directory /usr/class/cs346/redbase/data/ that you may use if you like, but we strongly encourage you to create your own additional data files. We also have provided an initial test shell script sm_test and one example test sm_test.1, obtained when you run setup. The example test loads relations from the data files mentioned above. As always, you will need to generate many more tests to thoroughly exercise your code, and we will conduct tests of our own during the grading process. You should find that, beginning with this component, it's much more fun to test your system since you can now interact with it in user mode.