CS346 - Spring 2011
Database System Implementation

RedBase Part 4: The Query Language Component
Due Sunday May 22

Introduction
The fourth part of the RedBase system you will implement -- the last part of the basic project -- is the Query Language (QL) component. This component implements the language RQL (for "RedBase Query Language"). RQL's data retrieval command is a restricted version of the SQL Select statement. RQL's data modification commands are restricted versions of SQL's Insert, Delete, and Update statements. The public methods of the QL component are called by the RedBase parser, similarly to the way SM component methods were called by the parser in Part 3. The QL component will use classes and methods from the IX, RM, and SM components. We'll first specify the syntax of RQL and explain what the RQL commands should do from a user's perspective. Then we'll specify the QL interface, which includes the methods that support the commands.

Warning or special treat, depending on your preference: Significantly more creativity and design work is required to implement this component than any of the previous components. We've defined the interface precisely, but there's lots to be done underneath the interface, and there are many different ways of doing it. We suggest that you think very carefully and thoroughly about how your program will operate before beginning implementation. In addition, if you're attempting to win the RedBase Efficiency Contest, then this component is one of the most important and is also your last chance! As mentioned above, there are numerous ways of implementing the QL interface -- some ways are much more efficient than others in terms of I/O, so you'll want to think, design, and code carefully in order to execute queries efficiently and best exploit the efficiency you've already built into your previous components.

Another warning: There's so much latitude in this part of the project, and so many techniques we've learned about in class that could "easily" be added to the basic QL functionality, that some students have a tendency to get overambitious. In the extreme case -- and it does happen every year -- students set their sights so high that they end up failing to get the basic functionality working. Don't let it happen to you.

The Language RQL
Once the RedBase system has been started up (by issuing the redbase command), RQL commands are recognized by the RedBase parser just as DDL and system utility commands were recognized in Part 3. Recall that all RedBase commands are terminated by a semicolon, and that command keywords are case-insensitive. Square brackets in the syntax specifications below indicate command components that are optional. Because RQL is a subset of the relational query language SQL, it is recommended that you review the SQL Select, Insert, Delete, and Update statements in your database textbook before beginning work on the QL component. If you would like to test SQL commands on a production relational DBMS in order to compare the results with your RedBase implementation, we recommend downloading the open-source DBMS
MySQL, or using Microsoft Office Access on a PC.

The RQL Select command

The syntax of the one data retrieval command in RQL is:
Select A1, A2, ..., Am
From R1, R2, ..., Rn
[Where A1' comp1 AV1 And A2' comp2 AV2 And ... And Ak' comp-k AVk];
This command has the standard SQL interpretation: The result of the command is structured as a relation with attributes A1, A2, ..., Am. For each tuple t in the cross-product of relations R1, R2, ..., Rn in the From clause, if tuple t satisfies all conditions in the Where clause, then the attributes of tuple t listed in the Select clause are included in the result (in the listed order). If the Where clause is omitted, it is considered to be true always. Duplicate tuples in the result are not eliminated.

An alternative form of the Select command is:

Select *
From R1, R2, ..., Rn
[Where A1' comp1 AV1 And A2' comp2 AV2 And ... And Ak' comp-k AVk];
The interpretation of this command is the same as for the first form of the command, except that every attribute of each satisfying tuple t in the cross-product is returned. That is, the command is equivalent to one in which the "*" in the Select clause is replaced by a list of all attributes of all relations R1, R2, ..., Rn in the From clause.

Checks on the Select command

The parser will ensure that an RQL Select command is valid syntactically before calling the corresponding QL component method. In addition to requiring the basic Select-From-Where syntax given above, the parser will ensure:

Upon receiving a syntactically valid query, your QL component code must perform the following semantic checks:

Some of these checks also must be performed on Delete and Update commands, described below.

The RQL Insert Command

Tuples may be inserted into a relation one at a time (as opposed to bulk loading) using the following RQL command:
Insert Into relName Values (V1, V2, ..., Vn);
This command inserts into relation relName a new tuple with the specified attribute values. As in bulk loading, attribute values must appear in the same order as in the "create table" command that was executed for relation relName. Values are specified in the same way as in load files and the Select command. String values can be of any length up to the length specified for the corresponding attribute.

The RQL Delete Command

Tuples may be deleted from relations using the following RQL command:
Delete From relName
[Where A1 comp1 AV1 And A2 comp2 AV2 And ... And Ak comp-k AVk];
This command deletes from relation relName all tuples satisfying the Where clause. As in the Select command, the conditions in the Where clause compare attributes to constant values or to other attributes. In the Where clause, all attributes may be specified simply as attrName (since there is only one relName), although specifying the relName is not an error. Comparison operators and specification of values are the same as in the Select command. If the Where clause is omitted, all tuples in relName are deleted.

The RQL Update Command

Tuples in relations may be updated using the following RQL command:
Update relName
Set attrName = AV
[Where A1 comp1 AV1 And A2 comp2 AV2 And ... And Ak comp-k AVk];
This command updates in relation relName all tuples satisfying the Where clause. The Where clause is exactly as in the Delete command described above. AV on the right-hand side of the equality operator is either an attribute or a constant value. An attribute may be specified simply as attrName (since there is only one relName), although specifying the relName is not an error. A value is specified as in the Select command. Each updated tuple is modified so that its attribute attrName gets the value AV (if AV is a constant), or the value of the updated tuple's AV attribute (if AV is an attribute). If the Where clause is omitted, all tuples in relName are updated.

QL Interface
QL_Manager is the only class in the QL interface. It contains four public methods, corresponding to the four RQL commands described above. The QL interface also includes a QL_PrintError routine for printing messages associated with nonzero QL return codes. Before beginning work on the QL component you should run the setup script with argument "4" (for project part 4). It will copy a simple QL tester, similar to the one provided for the SM component, and it will recopy ql.h and ql_manager_stub.cc so that you have the latest versions of these files.

You should rename ql_manager_stub.cc as ql_manager.cc, then complete the implementation. As usual, all QL component public methods (except constructors and destructors) should return 0 if they complete normally and a nonzero return code otherwise. Parser handling of nonzero return codes is exactly the same as in Part 3: If the parser receives a nonzero return code it will use the value of the return code to call the appropriate component's PrintError routine. If the return code is positive, the parser will resume with the command loop; if the code is negative, the parser will terminate the command loop.

QL_Manager Class

struct RelAttr {
  char *relName;     // relation name (may be NULL) 
  char *attrName;    // attribute name              
  friend ostream &operator<<(ostream &s, const RelAttr &ra);
};

struct Value {
  AttrType type;     // type of value               
  void     *data;    // value                       
  friend ostream &operator<<(ostream &s, const Value &v);
};

struct Condition {
  RelAttr lhsAttr;      // left-hand side attribute                     
  CompOp  op;           // comparison operator                          
  int     bRhsIsAttr;   // TRUE if right-hand side is an attribute
                        //   and not a value
  RelAttr rhsAttr;      // right-hand side attribute if bRhsIsAttr = TRUE
  Value   rhsValue;     // right-hand side value if bRhsIsAttr = FALSE
  friend ostream &operator<<(ostream &s, const Condition &c);
};

class QL_Manager {
 public:
                                              // Constructor
      QL_Manager (SM_Manager &smm, IX_Manager &ixm, RM_Manager &rmm);
      ~QL_Manager ();                         // Destructor
   RC Select (int           nSelAttrs,        // # attrs in Select clause
              const RelAttr selAttrs[],       // attrs in Select clause
              int           nRelations,       // # relations in From clause
              const char * const relations[], // relations in From clause
              int           nConditions,      // # conditions in Where clause
              const Condition conditions[]);  // conditions in Where clause
   RC Insert (const char  *relName,           // relation to insert into
              int         nValues,            // # values to insert
              const Value values[]);          // values to insert
   RC Delete (const char *relName,            // relation to delete from
              int        nConditions,         // # conditions in Where clause
              const Condition conditions[]);  // conditions in Where clause
   RC Update (const char *relName,            // relation to update
              const RelAttr &updAttr,         // attribute to update
              const int bIsValue,             // 0/1 if RHS of = is attribute/value
              const RelAttr &rhsRelAttr,      // attr on RHS of =
              const Value &rhsValue,          // value on RHS of =
              int   nConditions,              // # conditions in Where clause
              const Condition conditions[]);  // conditions in Where clause
};

RC Select (int nSelAttrs, const RelAttr selAttrs[], int nRelations, const char * const relations[], int nConditions, const Condition conditions[])

The six arguments to this method logically comprise three pairs. The first argument of each pair is an integer n indicating the number of items in the second argument of the pair. The second argument is an array of length n containing the actual items. For example, argument nSelAttrs contains the number of selected attributes, while argument selAttrs is an array of length nSelAttrs containing the actual attributes. (A single attribute with string value "*" is passed for "Select *" queries.) The same approach is used for variable-length argument lists in other methods of the QL_Manager class.

Method QL_Manager::Select contains the core of the work you will do for the QL component. The main requirement is that when the user enters a valid RQL Select command, sooner or later the correct result according to the semantics described above should print on the screen. There are many options you will need to consider when implementing this method; some design requirements and suggestions follow.

Your implementation should build a query tree, which will first serve as your logical query plan and then will be transformed into a physical plan. We strongly suggest using an iterator approach for plan execution, although you may use a temporary relation approach instead if you have valid reasons for doing so.

Regardless of other design decisions you make, we would like your query execution strategy to be able to take advantage of indexes whenever it is (relatively) straightforward to do so. In particular:

When returned to the user, a query result should look like a relation: a header, followed by a set of tuples, followed by a summary of the number of tuples returned. You must use the Printer class we provided in project part 3 to print RQL query results.

RC Insert (const char *relName, int nValues, const Value values[])

This method should create a new tuple in relation *relName with the specified attribute values. Your code should confirm that the number and type of values in arguments nValues and values matches the schema of the relation. This method should construct the tuple, call method RM_FileHandle::InsertRec to insert the tuple, then call method IX_IndexHandle::InsertEntry for each index to create appropriate index entries for the tuple. To provide feedback to the user, this method should also print the inserted tuple on the screen (using the Printer class).

RC Delete (const char *relName, int nConditions, const Condition conditions[])

This method should delete all tuples in relation *relName that satisfy the specified conditions. If there are no conditions, then this method should delete all tuples in *relName. (Note that even if all tuples are deleted, the relation should not be destroyed, it should just become empty.) Recall that within argument conditions, attributes may be specified by attribute name only (i.e., relName may be NULL in RelAttr structures), since there is only one relation involved.

Any number of conditions is allowed. Let R be relation *relName, suppose there is a condition of the form R.A=v where v is a constant value, and suppose R has an index on attribute A. Then you can use an index scan to efficiently fetch those tuples of R satisfying R.A=v. For each fetched tuple, check the remaining conditions and, if they are all satisfied, delete the tuple. If there is no condition with an appropriate index, then a file scan should be used. Each tuple is fetched, the conditions are checked, and if satisfied the tuple is deleted. Don't forget to delete corresponding index entries when you delete a tuple.

Although the execution strategies for Delete commands generally are simpler than for Select commands, you should still build physical query plans for Delete. You will probably find that some physical operators from your implementation of Select can be reused.

To provide feedback, this method should print the deleted tuples on the screen (using the Printer class).

RC Update (const char *relName, const RelAttr &updAttr, const int bIsValue, const RelAttr &rhsRelAttr, const Value &rhsValue, int nConditions, const Condition conditions[]);

This method should update all tuples in relation *relName that satisfy the specified conditions, setting the value of attribute *updAttr in each updated tuple to a new value. If bIsValue=1 then the new updated value should be the constant in rhsValue, otherwise the new updated value should be taken from the attribute specified by rhsRelAttr. If there are no conditions, then this method should update all tuples in *relName. Conditions are exactly as in method QL_Manager::Delete. You can use the same strategy for exploiting an index in this method as was described for QL_Manager::Delete, with one important exception: don't scan an index on the attribute being updated unless you have accounted for this case in your IX component! If you are updating an indexed attribute, don't forget to delete and then insert index entries for each updated tuple. You should build physical query plans for Update commands, and again you will probably find that you can use reuse previously implemented physical operators. To provide feedback, this method should print the updated tuples on the screen (using the Printer class).

QL_PrintError

void QL_PrintError (RC rc);

This routine should write a message associated with the nonzero QL return code rc onto the Unix stderr output stream. This routine has no return value. As in the SM component, you should not call QL_PrintError directly from within the QL (or any other) component. QL_PrintError is called automatically by the parser when it receives a nonzero return code from an SM component method it invokes.

Printing Query Plans
Your implementations of the RQL Select, Delete, and Update commands should ultimately build a physical query plan before actually executing the query. Query plans are not required for Insert commands. Your system should provide a way to pretty-print physical query plans for Select, Delete, and Update commands before the command is executed. You will probably want to render tree-structured plans using some form of indentation.

Query plans should be printed only when the value of special global variable bQueryPlans is 1. Two administrative commands are provided in the parser for toggling the bQueryPlans variable:

queryplans on; Turn on the display of query plans
queryplans off; Turn off the display of query plans

You must #include "parser.h" in ql_manager.cc, or any other files that access the bQueryPlans variable.

For full credit in this component of the project, the TA must be able to easily understand your query execution strategy for any RQL Select, Delete, or Update command based upon the pretty-printed plan information. But again, please do not print plans unless variable bQueryPlans is set.

Miscellaneous

Documentation, Testing, Submission, Etc.