CS346 - Spring 2011
Database System Implementation

RedBase Part 4: The Query Language Component Due Sunday May 22

Introduction

The fourth part of the RedBase system you will implement -- the last part of the basic project -- is the Query Language (QL) component. This component implements the language RQL (for "RedBase Query Language"). RQL's data retrieval command is a restricted version of the SQL Select statement. RQL's data modification commands are restricted versions of SQL's Insert, Delete, and Update statements. The public methods of the QL component are called by the RedBase parser, similarly to the way SM component methods were called by the parser in Part 3. The QL component will use classes and methods from the IX, RM, and SM components. We'll first specify the syntax of RQL and explain what the RQL commands should do from a user's perspective. Then we'll specify the QL interface, which includes the methods that support the commands.

Warning or special treat, depending on your preference: Significantly more creativity and design work is required to implement this component than any of the previous components. We've defined the interface precisely, but there's lots to be done underneath the interface, and there are many different ways of doing it. We suggest that you think very carefully and thoroughly about how your program will operate before beginning implementation. In addition, if you're attempting to win the RedBase Efficiency Contest, then this component is one of the most important and is also your last chance! As mentioned above, there are numerous ways of implementing the QL interface -- some ways are much more efficient than others in terms of I/O, so you'll want to think, design, and code carefully in order to execute queries efficiently and best exploit the efficiency you've already built into your previous components.

Another warning: There's so much latitude in this part of the project, and so many techniques we've learned about in class that could "easily" be added to the basic QL functionality, that some students have a tendency to get overambitious. In the extreme case -- and it does happen every year -- students set their sights so high that they end up failing to get the basic functionality working. Don't let it happen to you.

The Language RQL

Once the RedBase system has been started up (by issuing the redbase command), RQL commands are recognized by the RedBase parser just as DDL and system utility commands were recognized in Part 3. Recall that all RedBase commands are terminated by a semicolon, and that command keywords are case-insensitive. Square brackets in the syntax specifications below indicate command components that are optional. Because RQL is a subset of the relational query language SQL, it is recommended that you review the SQL Select, Insert, Delete, and Update statements in your database textbook before beginning work on the QL component. If you would like to test SQL commands on a production relational DBMS in order to compare the results with your RedBase implementation, we recommend downloading the open-source DBMS MySQL, or using Microsoft Office Access on a PC.

The RQL Select command

The syntax of the one data retrieval command in RQL is:

Select A1, A2, ..., Am
From R1, R2, ..., Rn
[Where A1' comp1 AV1 And A2' comp2 AV2 And ... And Ak' comp-k AVk];

This command has the standard SQL interpretation: The result of the command is structured as a relation with attributes A1, A2, ..., Am. For each tuple t in the cross-product of relations R1, R2, ..., Rn in the From clause, if tuple t satisfies all conditions in the Where clause, then the attributes of tuple t listed in the Select clause are included in the result (in the listed order). If the Where clause is omitted, it is considered to be true always. Duplicate tuples in the result are not eliminated.

An alternative form of the Select command is:

Select *
From R1, R2, ..., Rn
[Where A1' comp1 AV1 And A2' comp2 AV2 And ... And Ak' comp-k AVk];

The interpretation of this command is the same as for the first form of the command, except that every attribute of each satisfying tuple t in the cross-product is returned. That is, the command is equivalent to one in which the "*" in the Select clause is replaced by a list of all attributes of all relations R1, R2, ..., Rn in the From clause.

Checks on the Select command

The parser will ensure that an RQL Select command is valid syntactically before calling the corresponding QL component method. In addition to requiring the basic Select-From-Where syntax given above, the parser will ensure:

Each Ai in the Select clause is specified as either relName.attrName or attrName, or the Select list consists of the single element "*".
Each Ai' in the Where clause is specified as either relName.attrName or attrName.
Each comp-i in the Where clause is one of the comparison operators =, <, >, <=, >=, or <>.
Each AVi in the Where clause is specified as relName.attrName or attrName, or it is a constant value. Integer values are specified as, e.g., 10 or -5, float values are specified as, e.g., 3.5E-3, and character string values are specified as, e.g., "Smith" (don't forget the quotes).

Upon receiving a syntactically valid query, your QL component code must perform the following semantic checks:

Each Ri in the From clause must name a relation in the database. A relation may not appear more than once in the From clause (since RQL does not support relation variables to distinguish multiple appearances).
For each Ai = [relName.]attrName in the Select clause, if relName is specified then it must be one of the relations in the From clause. If relName is not specified, then there must be exactly one relation in the From clause with an attribute named attrName. The same rule applies for each Ai' = [relName.]attrName and each AVi = [relName.]attrName in the Where clause.
For each Ai' comp-i AVi in the Where clause, the types of the two operands Ai' and AVi must be compatible with each other. (A discussion of type compatibility appears in the specification of the QL Interface below.)

Some of these checks also must be performed on Delete and Update commands, described below.

The RQL Insert Command

Tuples may be inserted into a relation one at a time (as opposed to bulk loading) using the following RQL command:

Insert Into relName Values (V1, V2, ..., Vn);

This command inserts into relation relName a new tuple with the specified attribute values. As in bulk loading, attribute values must appear in the same order as in the "create table" command that was executed for relation relName. Values are specified in the same way as in load files and the Select command. String values can be of any length up to the length specified for the corresponding attribute.

The RQL Delete Command

Tuples may be deleted from relations using the following RQL command:

Delete From relName
[Where A1 comp1 AV1 And A2 comp2 AV2 And ... And Ak comp-k AVk];

This command deletes from relation relName all tuples satisfying the Where clause. As in the Select command, the conditions in the Where clause compare attributes to constant values or to other attributes. In the Where clause, all attributes may be specified simply as attrName (since there is only one relName), although specifying the relName is not an error. Comparison operators and specification of values are the same as in the Select command. If the Where clause is omitted, all tuples in relName are deleted.

The RQL Update Command

Tuples in relations may be updated using the following RQL command:

Update relName
Set attrName = AV
[Where A1 comp1 AV1 And A2 comp2 AV2 And ... And Ak comp-k AVk];

This command updates in relation relName all tuples satisfying the Where clause. The Where clause is exactly as in the Delete command described above. AV on the right-hand side of the equality operator is either an attribute or a constant value. An attribute may be specified simply as attrName (since there is only one relName), although specifying the relName is not an error. A value is specified as in the Select command. Each updated tuple is modified so that its attribute attrName gets the value AV (if AV is a constant), or the value of the updated tuple's AV attribute (if AV is an attribute). If the Where clause is omitted, all tuples in relName are updated.

QL Interface

QL_Manager is the only class in the QL interface. It contains four public methods, corresponding to the four RQL commands described above. The QL interface also includes a QL_PrintError routine for printing messages associated with nonzero QL return codes. Before beginning work on the QL component you should run the setup script with argument "4" (for project part 4). It will copy a simple QL tester, similar to the one provided for the SM component, and it will recopy ql.h and ql_manager_stub.cc so that you have the latest versions of these files.

You should rename ql_manager_stub.cc as ql_manager.cc, then complete the implementation. As usual, all QL component public methods (except constructors and destructors) should return 0 if they complete normally and a nonzero return code otherwise. Parser handling of nonzero return codes is exactly the same as in Part 3: If the parser receives a nonzero return code it will use the value of the return code to call the appropriate component's PrintError routine. If the return code is positive, the parser will resume with the command loop; if the code is negative, the parser will terminate the command loop.

QL_Manager Class

As always, all necessary component initialization should take place within the constructor for the QL_Manager class, and all "cleanup" should take place within its destructor.
When implementing the public methods of the QL_Manager class, in addition to adhering to the method descriptions below, please refer to the RQL command descriptions given above.
If you treated user-specified names as case-sensitive (respectively, case-insensitive) in the SM component, then you should probably continue to do the same in this component. As in the SM component, all user-specified names are passed from the parser exactly as they are typed by the user.
The SM component interface from Part 3 exports the methods that implement the RedBase DDL and system utility commands, along with methods OpenDb and CloseDb. In the QL component you will almost certainly find a need to use other functionality you implemented in Part 3; surely you will at least need to obtain schema information from the system catalogs relcat and attrcat. Consequently, in order to implement Part 4, you will probably need to allow the QL component to access some classes and methods that you may have treated as private in the SM component for Part 3. (Depending on how you implemented SM, you may even find it convenient to enhance the SM component to some extent, for convenience in implementing QL methods.)
As usual, you are free to extend the QL interface if you find it convenient, although you should not modify the existing methods since they are called by the parser as specified.
All printing of relations (including query answers) in this component must be done through the Printer class we provided with the SM component. Please do not alter the way you print tuples or relations -- our test suite depends on conformance.
Values passed to QL component methods are handled as follows. The parser recognizes whether it is reading an integer, a float, or a string. It converts the input to the appropriate type; then when it calls the QL component method it passes a structure that contains the type of the value along with a pointer to the value cast as (void *). The type information allows you to check that the type of the value is the type expected (based on schema information), and you will need to cast the value back from (void *) to its appropriate type.
In attribute-constant and attribute-attribute comparisons, you are not required to perform coercion so that integers and floats can be compared, but you are welcome to do so if you wish. For strings, you should permit the case where the two values being compared are strings of different length. You can do so by null-terminating or null-padding the shorter one.
Before specifying the interface for the QL_Manager class, we define several structures that are used in passing parameters from the parser to QL_Manager methods. These structures are defined in parser.h. The ostream operator in each of these structures is provided so that structure values can be sent to standard output for debugging purposes (i.e., by writing "cout << <object>;").

struct RelAttr {
  char *relName;     // relation name (may be NULL) 
  char *attrName;    // attribute name              
  friend ostream &operator<<(ostream &s, const RelAttr &ra);
};

struct Value {
  AttrType type;     // type of value               
  void     *data;    // value                       
  friend ostream &operator<<(ostream &s, const Value &v);
};

struct Condition {
  RelAttr lhsAttr;      // left-hand side attribute                     
  CompOp  op;           // comparison operator                          
  int     bRhsIsAttr;   // TRUE if right-hand side is an attribute
                        //   and not a value
  RelAttr rhsAttr;      // right-hand side attribute if bRhsIsAttr = TRUE
  Value   rhsValue;     // right-hand side value if bRhsIsAttr = FALSE
  friend ostream &operator<<(ostream &s, const Condition &c);
};

class QL_Manager {
 public:
                                              // Constructor
      QL_Manager (SM_Manager &smm, IX_Manager &ixm, RM_Manager &rmm);
      ~QL_Manager ();                         // Destructor
   RC Select (int           nSelAttrs,        // # attrs in Select clause
              const RelAttr selAttrs[],       // attrs in Select clause
              int           nRelations,       // # relations in From clause
              const char * const relations[], // relations in From clause
              int           nConditions,      // # conditions in Where clause
              const Condition conditions[]);  // conditions in Where clause
   RC Insert (const char  *relName,           // relation to insert into
              int         nValues,            // # values to insert
              const Value values[]);          // values to insert
   RC Delete (const char *relName,            // relation to delete from
              int        nConditions,         // # conditions in Where clause
              const Condition conditions[]);  // conditions in Where clause
   RC Update (const char *relName,            // relation to update
              const RelAttr &updAttr,         // attribute to update
              const int bIsValue,             // 0/1 if RHS of = is attribute/value
              const RelAttr &rhsRelAttr,      // attr on RHS of =
              const Value &rhsValue,          // value on RHS of =
              int   nConditions,              // # conditions in Where clause
              const Condition conditions[]);  // conditions in Where clause
};

RC Select (int nSelAttrs, const RelAttr selAttrs[], int nRelations, const char * const relations[], int nConditions, const Condition conditions[])

The six arguments to this method logically comprise three pairs. The first argument of each pair is an integer n indicating the number of items in the second argument of the pair. The second argument is an array of length n containing the actual items. For example, argument nSelAttrs contains the number of selected attributes, while argument selAttrs is an array of length nSelAttrs containing the actual attributes. (A single attribute with string value "*" is passed for "Select *" queries.) The same approach is used for variable-length argument lists in other methods of the QL_Manager class.

Method QL_Manager::Select contains the core of the work you will do for the QL component. The main requirement is that when the user enters a valid RQL Select command, sooner or later the correct result according to the semantics described above should print on the screen. There are many options you will need to consider when implementing this method; some design requirements and suggestions follow.

Your implementation should build a query tree, which will first serve as your logical query plan and then will be transformed into a physical plan. We strongly suggest using an iterator approach for plan execution, although you may use a temporary relation approach instead if you have valid reasons for doing so.

Regardless of other design decisions you make, we would like your query execution strategy to be able to take advantage of indexes whenever it is (relatively) straightforward to do so. In particular:

Suppose the query includes a local selection condition of the form R.A=v, where v is a constant value. If relation R has an index on attribute A, then an IX component index scan together with calls to method RM_FileHandle::GetRec can be used to fetch the tuples of R, rather than using an RM component file scan. (Hereafter, we use "index scan" to mean an IX component condition-based scan together with tuple fetching, and "file scan" to mean an RM component scan.)
You may use a simple nested-loop algorithm for each join or cross-product. Suppose you are joining R with S based on attribute equality, suppose S has no local selection condition, and suppose S has an index on its join attribute A. Then you can make S the inner relation of the nested-loop join and use an index scan on S instead of a file scan. (You need not, however, attempt to reorder the relations in the query in order to make such index joins possible.)

When returned to the user, a query result should look like a relation: a header, followed by a set of tuples, followed by a summary of the number of tuples returned. You must use the Printer class we provided in project part 3 to print RQL query results.

RC Insert (const char *relName, int nValues, const Value values[])

This method should create a new tuple in relation *relName with the specified attribute values. Your code should confirm that the number and type of values in arguments nValues and values matches the schema of the relation. This method should construct the tuple, call method RM_FileHandle::InsertRec to insert the tuple, then call method IX_IndexHandle::InsertEntry for each index to create appropriate index entries for the tuple. To provide feedback to the user, this method should also print the inserted tuple on the screen (using the Printer class).

RC Delete (const char *relName, int nConditions, const Condition conditions[])

This method should delete all tuples in relation *relName that satisfy the specified conditions. If there are no conditions, then this method should delete all tuples in *relName. (Note that even if all tuples are deleted, the relation should not be destroyed, it should just become empty.) Recall that within argument conditions, attributes may be specified by attribute name only (i.e., relName may be NULL in RelAttr structures), since there is only one relation involved.

Any number of conditions is allowed. Let R be relation *relName, suppose there is a condition of the form R.A=v where v is a constant value, and suppose R has an index on attribute A. Then you can use an index scan to efficiently fetch those tuples of R satisfying R.A=v. For each fetched tuple, check the remaining conditions and, if they are all satisfied, delete the tuple. If there is no condition with an appropriate index, then a file scan should be used. Each tuple is fetched, the conditions are checked, and if satisfied the tuple is deleted. Don't forget to delete corresponding index entries when you delete a tuple.

Although the execution strategies for Delete commands generally are simpler than for Select commands, you should still build physical query plans for Delete. You will probably find that some physical operators from your implementation of Select can be reused.

To provide feedback, this method should print the deleted tuples on the screen (using the Printer class).

RC Update (const char *relName, const RelAttr &updAttr, const int bIsValue, const RelAttr &rhsRelAttr, const Value &rhsValue, int nConditions, const Condition conditions[]);

This method should update all tuples in relation *relName that satisfy the specified conditions, setting the value of attribute *updAttr in each updated tuple to a new value. If bIsValue=1 then the new updated value should be the constant in rhsValue, otherwise the new updated value should be taken from the attribute specified by rhsRelAttr. If there are no conditions, then this method should update all tuples in *relName. Conditions are exactly as in method QL_Manager::Delete. You can use the same strategy for exploiting an index in this method as was described for QL_Manager::Delete, with one important exception: don't scan an index on the attribute being updated unless you have accounted for this case in your IX component! If you are updating an indexed attribute, don't forget to delete and then insert index entries for each updated tuple. You should build physical query plans for Update commands, and again you will probably find that you can use reuse previously implemented physical operators. To provide feedback, this method should print the updated tuples on the screen (using the Printer class).

QL_PrintError

void QL_PrintError (RC rc);

This routine should write a message associated with the nonzero QL return code rc onto the Unix stderr output stream. This routine has no return value. As in the SM component, you should not call QL_PrintError directly from within the QL (or any other) component. QL_PrintError is called automatically by the parser when it receives a nonzero return code from an SM component method it invokes.

Printing Query Plans

Your implementations of the RQL Select, Delete, and Update commands should ultimately build a physical query plan before actually executing the query. Query plans are not required for Insert commands. Your system should provide a way to pretty-print physical query plans for Select, Delete, and Update commands before the command is executed. You will probably want to render tree-structured plans using some form of indentation.

Query plans should be printed only when the value of special global variable bQueryPlans is 1. Two administrative commands are provided in the parser for toggling the bQueryPlans variable:

queryplans on; Turn on the display of query plans
queryplans off; Turn off the display of query plans

You must #include "parser.h" in ql_manager.cc, or any other files that access the bQueryPlans variable.

For full credit in this component of the project, the TA must be able to easily understand your query execution strategy for any RQL Select, Delete, or Update command based upon the pretty-printed plan information. But again, please do not print plans unless variable bQueryPlans is set.

Miscellaneous

If you have been following our suggestions, then the system catalog relations relcat and attrcat will be open at all times during a RedBase session, and these relations will be the only ones left open when a command is not underway. You should allow RQL data retrieval commands to be performed on system catalogs. As with the help and print utilities in the SM component, you may want to force to disk the files containing a catalog before executing an RQL query over it, unless you have adopted a policy of forcing catalogs whenever they are modified. Users should not be permitted to modify system catalogs using RQL commands.

When implementing the nested-loop join operator as an iterator, do not open and close the RM file for the inner relation on each open/close iterator call, since all pages are flushed from the buffer when a file is closed. Instead, create one RM_FileHandle and use it to make repeated RM file scans.

Documentation, Testing, Submission, Etc.

The requirements for documentation and the method of submission are the same as for previous project parts, including citing in your ql_DOC file any assistance that you received on this component. In this submission please remember to include all updated source files and the three executables: redbase, dbcreate, and dbdestroy. All executables that you turn in must be compiled with statistics on.
Remember to thoroughly test your system with large relations and complex queries and updates. (For example, we strongly suggest that you devise your own data in addition to the data provided in our sample files.) As usual, we will conduct tests of our own during the grading process.
At the same time that we test each student's QL component for grading purposes, we will be benchmarking each student's complete system to determine the winners of the efficiency contest. If you prefer that we not benchmark your system for the contest, please notify the TA.

`queryplans on;`	Turn on the display of query plans
`queryplans off;`	Turn off the display of query plans