CS346 - Spring 2011
Database System Implementation
RedBase Part 4: The Query Language Component
Due Sunday May 22
|
The fourth part of the RedBase system you will implement -- the last
part of the basic project -- is the Query Language (QL)
component. This component implements the language RQL (for
"RedBase Query Language"). RQL's data retrieval command is a
restricted version of the SQL Select statement. RQL's data
modification commands are restricted versions of SQL's
Insert, Delete, and Update statements. The
public methods of the QL component are called by the RedBase parser,
similarly to the way SM component methods were called by the parser in
Part 3. The QL component will use classes and methods from the IX,
RM, and SM components. We'll first specify the syntax of RQL and
explain what the RQL commands should do from a user's perspective.
Then we'll specify the QL interface, which includes the methods that
support the commands.
Warning or special treat, depending on your
preference: Significantly more creativity and design work is required
to implement this component than any of the previous components.
We've defined the interface precisely, but there's lots to be done
underneath the interface, and there are many different ways of doing
it. We suggest that you think very carefully and thoroughly about how
your program will operate before beginning implementation. In
addition, if you're attempting to win the RedBase Efficiency Contest,
then this component is one of the most important and is also your last
chance! As mentioned above, there are numerous ways of implementing
the QL interface -- some ways are much more efficient than others in
terms of I/O, so you'll want to think, design, and code carefully in
order to execute queries efficiently and best exploit the efficiency
you've already built into your previous components.
Another warning: There's so much latitude in this part of
the project, and so many techniques we've learned about in class that
could "easily" be added to the basic QL functionality, that some
students have a tendency to get overambitious. In the extreme case --
and it does happen every year -- students set their sights so high that
they end up failing to get the basic functionality working. Don't let
it happen to you.
Once the RedBase system has been started up (by issuing the
redbase command), RQL commands are recognized by the RedBase
parser just as DDL and system utility commands were recognized in Part
3. Recall that all RedBase commands are terminated by a semicolon,
and that command keywords are case-insensitive. Square brackets in
the syntax specifications below indicate command components that are
optional. Because RQL is a subset of the relational query language
SQL, it is recommended that you review the SQL Select,
Insert, Delete, and Update statements in
your database textbook before beginning work on the QL component. If
you would like to test SQL commands on a production relational DBMS in
order to compare the results with your RedBase implementation, we
recommend downloading the open-source DBMS MySQL, or using
Microsoft Office Access on a PC.
The RQL Select command
The syntax of the one data retrieval command in RQL is:
Select A1, A2, ..., Am
From R1, R2, ..., Rn
[Where A1' comp1 AV1 And A2' comp2 AV2 And ... And Ak' comp-k AVk];
This command has the standard SQL interpretation: The result of the
command is structured as a relation with attributes A1,
A2, ..., Am. For each tuple t in the
cross-product of relations R1, R2, ..., Rn in the
From clause, if tuple t satisfies all conditions in
the Where clause, then the attributes of tuple t
listed in the Select clause are included in the result (in
the listed order). If the Where clause is omitted, it is
considered to be true always. Duplicate tuples in the result
are not eliminated.
An alternative form of the Select command is:
Select *
From R1, R2, ..., Rn
[Where A1' comp1 AV1 And A2' comp2 AV2 And ... And Ak' comp-k AVk];
The interpretation of this command is the same as for the first form
of the command, except that every attribute of each satisfying tuple
t in the cross-product is returned. That is, the command is
equivalent to one in which the "*" in the Select
clause is replaced by a list of all attributes of all relations
R1, R2, ..., Rn in the From clause.
Checks on the Select command
The parser will ensure that an RQL Select command is valid
syntactically before calling the corresponding QL component method.
In addition to requiring the basic Select-From-Where syntax
given above, the parser will ensure:
- Each Ai in the Select clause is specified as
either relName.attrName or attrName, or the
Select list consists of the single element "*".
- Each Ai' in the Where clause is specified as
either relName.attrName or attrName.
- Each comp-i in the Where clause is one of the
comparison operators =, <, >, <=,
>=, or <>.
- Each AVi in the Where clause is specified as
relName.attrName or attrName, or it is a constant
value. Integer values are specified as, e.g., 10 or
-5, float values are specified as, e.g., 3.5E-3, and
character string values are specified as, e.g., "Smith"
(don't forget the quotes).
Upon receiving a syntactically valid query, your QL component code
must perform the following semantic checks:
- Each Ri in the From clause must name a relation
in the database. A relation may not appear more than once in the
From clause (since RQL does not support relation variables to
distinguish multiple appearances).
- For each Ai = [relName.]attrName in the
Select clause, if relName is specified then it must be
one of the relations in the From clause. If relName
is not specified, then there must be exactly one relation in the
From clause with an attribute named attrName. The
same rule applies for each Ai' =
[relName.]attrName and each AVi =
[relName.]attrName in the Where clause.
- For each Ai' comp-i AVi in the
Where clause, the types of the two operands Ai' and
AVi must be compatible with each other. (A discussion of type
compatibility appears in the specification of the
QL Interface below.)
Some of these checks also must be performed on Delete and
Update commands, described below.
The RQL Insert Command
Tuples may be inserted into a relation one at a time (as opposed to
bulk loading) using the following RQL command:
Insert Into relName Values (V1, V2, ..., Vn);
This command inserts into relation relName a new tuple with the
specified attribute values. As in bulk loading, attribute values must
appear in the same order as in the "create table" command that
was executed for relation relName. Values are specified in the
same way as in load files and the Select command. String
values can be of any length up to the length specified for the
corresponding attribute.
The RQL Delete Command
Tuples may be deleted from relations using the following RQL command:
Delete From relName
[Where A1 comp1 AV1 And A2 comp2 AV2 And ... And Ak comp-k AVk];
This command deletes from relation relName all tuples
satisfying the Where clause. As in the Select
command, the conditions in the Where clause compare
attributes to constant values or to other attributes. In the
Where clause, all attributes may be specified simply as
attrName (since there is only one relName), although
specifying the relName is not an error. Comparison operators
and specification of values are the same as in the Select
command. If the Where clause is omitted, all tuples in
relName are deleted.
The RQL Update Command
Tuples in relations may be updated using the following RQL command:
Update relName
Set attrName = AV
[Where A1 comp1 AV1 And A2 comp2 AV2 And ... And Ak comp-k AVk];
This command updates in relation relName all tuples satisfying
the Where clause. The Where clause is exactly as in
the Delete command described above. AV on the
right-hand side of the equality operator is either an attribute or a
constant value. An attribute may be specified simply as
attrName (since there is only one relName), although
specifying the relName is not an error. A value is specified
as in the Select command. Each updated tuple is modified so
that its attribute attrName gets the value AV (if
AV is a constant), or the value of the updated tuple's
AV attribute (if AV is an attribute). If the
Where clause is omitted, all tuples in relName are
updated.
QL_Manager is the only class in the QL interface. It
contains four public methods, corresponding to the four RQL commands
described above. The QL interface also includes a
QL_PrintError routine for printing messages associated with
nonzero QL return codes. Before beginning work on the QL component
you should run the setup script with argument "4"
(for project part 4). It will copy a simple QL tester, similar to the
one provided for the SM component, and it will recopy ql.h
and ql_manager_stub.cc so that you have the latest versions
of these files.
You should rename ql_manager_stub.cc as
ql_manager.cc, then complete the implementation. As usual,
all QL component public methods (except constructors and destructors)
should return 0 if they complete normally and a nonzero return code
otherwise. Parser handling of nonzero return codes is exactly the
same as in Part 3: If the parser receives a nonzero return code it
will use the value of the return code to call the appropriate
component's PrintError routine. If the return code is
positive, the parser will resume with the command loop; if the code is
negative, the parser will terminate the command loop.
QL_Manager Class
- As always, all necessary component initialization should take
place within the constructor for the QL_Manager class, and
all "cleanup" should take place within its destructor.
- When implementing the public methods of the QL_Manager
class, in addition to adhering to the method descriptions below,
please refer to the RQL command descriptions given above.
- If you treated user-specified names as case-sensitive
(respectively, case-insensitive) in the SM component, then you should
probably continue to do the same in this component. As in the SM
component, all user-specified names are passed from the parser exactly
as they are typed by the user.
- The SM component interface from Part 3 exports the methods
that implement the RedBase DDL and system utility commands, along with
methods OpenDb and CloseDb. In the QL component you
will almost certainly find a need to use other functionality you
implemented in Part 3; surely you will at least need to obtain schema
information from the system catalogs relcat and
attrcat. Consequently, in order to implement Part 4, you
will probably need to allow the QL component to access some classes
and methods that you may have treated as private in the SM component
for Part 3. (Depending on how you implemented SM, you may even find
it convenient to enhance the SM component to some extent, for
convenience in implementing QL methods.)
- As usual, you are free to extend the QL interface if you find it
convenient, although you should not modify the existing methods since
they are called by the parser as specified.
- All printing of relations (including query answers) in this
component must be done through the Printer class we
provided with the SM component. Please do not alter the way you print
tuples or relations -- our test suite depends on conformance.
- Values passed to QL component methods are handled as follows.
The parser recognizes whether it is reading an integer, a float, or a
string. It converts the input to the appropriate type; then when it
calls the QL component method it passes a structure that contains the
type of the value along with a pointer to the value cast as (void
*). The type information allows you to check that the type of the
value is the type expected (based on schema information), and you will
need to cast the value back from (void *) to its appropriate
type.
- In attribute-constant and attribute-attribute comparisons, you
are not required to perform coercion so that integers and floats can
be compared, but you are welcome to do so if you wish. For
strings, you should permit the case where the two values being
compared are strings of different length. You can do so by
null-terminating or null-padding the shorter one.
- Before specifying the interface for the QL_Manager
class, we define several structures that are used in passing
parameters from the parser to QL_Manager methods. These
structures are defined in parser.h. The ostream
operator in each of these structures is provided so that structure
values can be sent to standard output for debugging purposes (i.e., by
writing "cout << <object>;").
struct RelAttr {
char *relName; // relation name (may be NULL)
char *attrName; // attribute name
friend ostream &operator<<(ostream &s, const RelAttr &ra);
};
struct Value {
AttrType type; // type of value
void *data; // value
friend ostream &operator<<(ostream &s, const Value &v);
};
struct Condition {
RelAttr lhsAttr; // left-hand side attribute
CompOp op; // comparison operator
int bRhsIsAttr; // TRUE if right-hand side is an attribute
// and not a value
RelAttr rhsAttr; // right-hand side attribute if bRhsIsAttr = TRUE
Value rhsValue; // right-hand side value if bRhsIsAttr = FALSE
friend ostream &operator<<(ostream &s, const Condition &c);
};
class QL_Manager {
public:
// Constructor
QL_Manager (SM_Manager &smm, IX_Manager &ixm, RM_Manager &rmm);
~QL_Manager (); // Destructor
RC Select (int nSelAttrs, // # attrs in Select clause
const RelAttr selAttrs[], // attrs in Select clause
int nRelations, // # relations in From clause
const char * const relations[], // relations in From clause
int nConditions, // # conditions in Where clause
const Condition conditions[]); // conditions in Where clause
RC Insert (const char *relName, // relation to insert into
int nValues, // # values to insert
const Value values[]); // values to insert
RC Delete (const char *relName, // relation to delete from
int nConditions, // # conditions in Where clause
const Condition conditions[]); // conditions in Where clause
RC Update (const char *relName, // relation to update
const RelAttr &updAttr, // attribute to update
const int bIsValue, // 0/1 if RHS of = is attribute/value
const RelAttr &rhsRelAttr, // attr on RHS of =
const Value &rhsValue, // value on RHS of =
int nConditions, // # conditions in Where clause
const Condition conditions[]); // conditions in Where clause
};
RC Select (int nSelAttrs, const RelAttr selAttrs[],
int nRelations, const char * const relations[],
int nConditions, const Condition conditions[])
The six arguments to this method logically comprise three pairs. The
first argument of each pair is an integer n indicating the
number of items in the second argument of the pair. The second
argument is an array of length n containing the actual items.
For example, argument nSelAttrs contains the number of
selected attributes, while argument selAttrs is an array of
length nSelAttrs containing the actual attributes. (A single
attribute with string value "*" is passed for "Select *"
queries.) The same approach is used for variable-length argument
lists in other methods of the QL_Manager class.
Method QL_Manager::Select contains the core of the work
you will do for the QL component. The main requirement is that when
the user enters a valid RQL Select command, sooner or later
the correct result according to the semantics described above should
print on the screen. There are many options you will need to consider
when implementing this method; some design requirements and
suggestions follow.
Your implementation should build a query tree, which will first
serve as your logical query plan and then will be transformed into a
physical plan. We strongly suggest using an iterator approach for
plan execution, although you may use a temporary relation approach
instead if you have valid reasons for doing so.
Regardless of other design decisions you make, we would like your
query execution strategy to be able to take advantage of indexes
whenever it is (relatively) straightforward to do so. In particular:
- Suppose the query includes a local selection condition of the
form R.A=v, where v is a constant value.
If relation R has an index on attribute A, then an IX
component index scan together with calls to method
RM_FileHandle::GetRec can be used to fetch the tuples of
R, rather than using an RM component file scan. (Hereafter, we
use "index scan" to mean an IX component condition-based scan together
with tuple fetching, and "file scan" to mean an RM component scan.)
- You may use a simple nested-loop algorithm for each join or
cross-product. Suppose you are joining R with S based
on attribute equality, suppose S has no local selection
condition, and suppose S has an index on its join attribute
A. Then you can make S the inner relation of the
nested-loop join and use an index scan on S instead of a file
scan. (You need not, however, attempt to reorder the relations in the
query in order to make such index joins possible.)
When returned to the user, a query result should look like a relation:
a header, followed by a set of tuples, followed by a summary of the
number of tuples returned. You must use the Printer
class we provided in project part 3 to print RQL query results.
RC Insert (const char *relName, int nValues, const Value values[])
This method should create a new tuple in relation *relName
with the specified attribute values. Your code should confirm that
the number and type of values in arguments nValues and
values matches the schema of the relation. This method
should construct the tuple, call method
RM_FileHandle::InsertRec to insert the tuple, then call
method IX_IndexHandle::InsertEntry for each index to create
appropriate index entries for the tuple. To provide feedback to the
user, this method should also print the inserted tuple on the screen
(using the Printer class).
RC Delete (const char *relName, int nConditions, const Condition conditions[])
This method should delete all tuples in relation *relName
that satisfy the specified conditions. If there are no conditions,
then this method should delete all tuples in *relName. (Note
that even if all tuples are deleted, the relation should not be
destroyed, it should just become empty.) Recall that within argument
conditions, attributes may be specified by attribute name
only (i.e., relName may be NULL in RelAttr
structures), since there is only one relation involved.
Any number of conditions is allowed. Let R be relation
*relName, suppose there is a condition of the form
R.A=v where v is a constant value, and
suppose R has an index on attribute A. Then you can use
an index scan to efficiently fetch those tuples of R satisfying
R.A=v. For each fetched tuple, check the
remaining conditions and, if they are all satisfied, delete the tuple.
If there is no condition with an appropriate index, then a file scan
should be used. Each tuple is fetched, the conditions are checked,
and if satisfied the tuple is deleted. Don't forget to delete
corresponding index entries when you delete a tuple.
Although the execution strategies for Delete commands
generally are simpler than for Select commands, you should
still build physical query plans for Delete. You will
probably find that some physical operators from your implementation of
Select can be reused.
To provide feedback, this method should print the deleted tuples on
the screen (using the Printer class).
RC Update (const char *relName, const RelAttr &updAttr,
const int bIsValue, const RelAttr &rhsRelAttr,
const Value &rhsValue, int nConditions, const Condition conditions[]);
This method should update all tuples in relation *relName
that satisfy the specified conditions, setting the value of attribute
*updAttr in each updated tuple to a new value. If
bIsValue=1 then the new updated value should be the constant
in rhsValue, otherwise the new updated value should be taken
from the attribute specified by rhsRelAttr. If there are no
conditions, then this method should update all tuples in
*relName. Conditions are exactly as in method
QL_Manager::Delete. You can use the same strategy for
exploiting an index in this method as was described for
QL_Manager::Delete, with one important exception: don't scan
an index on the attribute being updated unless you have accounted for
this case in your IX component! If you are updating an indexed
attribute, don't forget to delete and then insert index entries for
each updated tuple. You should build physical query plans for
Update commands, and again you will probably find that you
can use reuse previously implemented physical operators. To provide
feedback, this method should print the updated tuples on the screen
(using the Printer class).
QL_PrintError
void QL_PrintError (RC rc);
This routine should write a message associated with the nonzero QL
return code rc onto the Unix stderr output stream.
This routine has no return value. As in the SM component, you should
not call QL_PrintError directly from within the QL (or any
other) component. QL_PrintError is called automatically by
the parser when it receives a nonzero return code from an SM component
method it invokes.
Your implementations of the RQL Select, Delete, and
Update commands should ultimately build a physical query plan
before actually executing the query. Query plans are not required for
Insert commands. Your system should provide a way to
pretty-print physical query plans for Select,
Delete, and Update commands before the command is
executed. You will probably want to render tree-structured plans
using some form of indentation.
Query plans should be printed only when the value of special global
variable bQueryPlans is 1. Two administrative commands are
provided in the parser for toggling the bQueryPlans variable:
queryplans on; | Turn on the display of query plans
|
queryplans off; | Turn off the display of query plans
|
You must #include "parser.h" in ql_manager.cc, or
any other files that access the bQueryPlans variable.
For full credit in this component of the project, the TA must be
able to easily understand your query execution strategy for any RQL
Select, Delete, or Update command based
upon the pretty-printed plan information. But again, please do
not print plans unless variable bQueryPlans is set.
- If you have been following our suggestions, then the system
catalog relations relcat and attrcat will be open at
all times during a RedBase session, and these relations will be the
only ones left open when a command is not underway. You should allow
RQL data retrieval commands to be performed on system catalogs. As
with the help and print utilities in the SM
component, you may want to force to disk the files containing a
catalog before executing an RQL query over it, unless you have adopted
a policy of forcing catalogs whenever they are modified. Users should
not be permitted to modify system catalogs using RQL commands.
- When implementing the nested-loop join operator as an iterator,
do not open and close the RM file for the inner relation on each
open/close iterator call, since all pages are flushed from the buffer
when a file is closed. Instead, create one RM_FileHandle and
use it to make repeated RM file scans.
Documentation, Testing, Submission, Etc.
|
- The requirements for documentation and the method of submission
are the same as for previous project parts, including citing in your
ql_DOC file any assistance that you received on this
component. In this submission please remember to include all updated
source files and the three executables: redbase,
dbcreate, and dbdestroy. All executables that you
turn in must be compiled with statistics
on.
- Remember to thoroughly test your system with large relations
and complex queries and updates. (For example, we strongly suggest
that you devise your own data in addition to the data provided in our
sample files.) As usual, we will conduct tests of our own during the
grading process.
- At the same time that we test each student's QL component for
grading purposes, we will be benchmarking each student's complete
system to determine the winners of the efficiency contest. If you
prefer that we not benchmark your system for the contest,
please notify the TA.