CS346 - Spring 2011
Database System Implementation
RedBase Part 5: The Personal Extension
Proposal Due Wednesday May 11
Implementation Due Thu.-Fri. June 2-3
|
The last part of the RedBase system you will implement is the
Personal Extension (EX). This part gives you the
opportunity to implement additional features that you've thought about
along the way, more advanced or sophisticated techniques in a part of
the system that interests you the most, or a new system component not
included in the basic RedBase project.
Date | Time | Event
|
---|
TBA | TBA | Extra office hours (proposals)
|
Wednesday May 11 | 11:59 PM | Proposals due
|
Thursday-Friday June 2-3 | will sign up | Final project demos
|
- You must submit a proposal for your personal extension, described below. Proposals must be emailed to
cs346-staff@cs.stanford.edu by
11:59 PM on Wednesday May 11. Acceptable formats are plain text, pdf
file, postscript file, Word document, or URL for a Web page. No other
formats will be accepted. EX proposals are subject to the same late
policy as project parts.
- You are encouraged to discuss your ideas and proposal with
the instructor while you are developing them. In addition to regular
office hours, the instructor will
be holding extra office hours (TBA).
- You will discuss and demonstrate the implementation of your
extension during the final project demos scheduled
for June 2-3.
The personal extension should comprise as a minimum approximately the
same amount of programming effort as, say, the RM component of the
basic project. One of the motivations behind the formal proposal
process is for us to identify extensions that are likely to be far too
much or too little work (in addition to looking for obvious missing
pieces or design flaws).
Nevertheless, even after proposal feedback, students often make
unanticipated discoveries about their extensions once they are into
the programming process. The extension may turn out to be much more
or much less work than anticipated, or the overall design may need to
change. Your proposal should be considered as a guide and not a
contract -- it's no problem to revise your extension along the way, as
long as when demo time arrives you've implemented a complete and
interesting new feature.
Your proposal should consist of the following parts:
- A statement of the general functionality of the proposed
extension.
- A statement of which components of the basic RedBase system you will
need to modify or extend to accommodate the extension: PF, RM, IX, SM,
QL, and/or the parser.
- A description and/or diagram of the overall system design of the
extension, including how the extension fits into the rest of RedBase.
- An interface specification for the most significant C++ classes
and methods you expect to implement for the extension.
- An informal description of the functionality of each class and
method in the interface.
As a guideline, try to make your proposal resemble the
documents/lectures on previous project parts. Don't forget that the
extension proposal will count toward 5% of your final grade, so it's
worth putting some effort into it. We'll be looking for a nicely
thought out and well-specified extension of an appropriate scope,
exhibiting an understanding of the RedBase system and of database
system architecture and implementation in general. Please try to keep
your proposal to approximately 3-5 pages.
Although you are welcome to share general ideas with your fellow
students, we ask that proposals be conceived and written up by each
student individually. For detailed help or feedback please see the
instructor or TA.
Class names, constants, etc. that are part of your personal extension
should begin with the prefix EX, even when the extension augments an
existing component of the system. (This convention will allow us to
clearly identify the extension within your code.) General
implementation principles -- error handling, modularity, documentation,
etc. -- are the same as for the other four project components.
Obviously you will need to develop your own test suite for your
extension.
In considering a particular extension, remember to keep in mind
whether it may require you to modify the PF component or the parser.
Modifying the parser to augment existing commands or add new commands
is not difficult, especially if you have used Yacc in the past. See
file Parser.HowTo, and feel free to obtain help from the TA.
Modifying the PF component also should not be too difficult, as long
as you're willing to plunge into someone else's code.
Remember that your project extension and the final demo count for 20%
of your grade in the course. We suggest before your demo you go over
the following points carefully, and set aside extra time to prepare
for a smooth demo once your coding and debugging are complete.
- We will stay right on schedule, so please arrive at Gates 430
a few minutes in advance of the demo time you signed
up for. Unless other arrangements are made, you will run your demo on
a desktop PC running Linux, most likely by using ssh to one
of the Sweet Hall machines.
- Please bring two hard-copies of your ex_DOC file to
the demo. Like previous documentation files, it should describe your
design, key data structures, and testing strategy, and it should be no
more than 2 pages long. We will also have a copy of your graded demo
proposal on hand.
- You will meet with the instructors for about 20
minutes. During the first 5-10 minutes we'll discuss your extension,
its overall design, and perhaps some implementation details. In the
next 10 minutes or so you'll show us your system. If there's time
left, you'll sweat while we try out your system with a few queries or
commands of our own.
- It is to your great benefit to prepare in advance how you
plan to demonstrate your system. You should focus primarily on
your extension, however if you'd also like to exhibit features of
earlier components, you may do so, especially if there are features
you added since you turned in the component. (For example, if you
were doing poorly early in the course but have worked like mad to
catch up, this is your opportunity to show us.)
- You don't want to spend all of your time in the demo typing
queries or thinking about what you want to show us, so you definitely
should prepare a script in advance. Nor should you simply invoke a
large script and let it fly. Rather, you should plan to use RedBase
in an interactive fashion, cutting and pasting queries or commands
from the script one at a time. Please practice going through your
script several times in advance of the demo. Also, please don't
show us your exhaustive tests. Rather, show a representative sample
that concisely demonstrates what your system can do. How coherently
you demonstrate your system does tend to have an effect on our overall
impression.
- The TA will be looking over your code for the extension,
although he will not actually run it outside of the demo. You will
need to submit your code within 30 minutes of your demo ending time.
This is a hard deadline -- the schedule is extremely tight for us to
get all of the grading done, so we must have your code immediately.
Please submit all code that is new for the extension. Don't bother to
submit libraries or executables, just the .cc and .h
files. You will need to hand-create the list of files and save it in
submit.ex, then type "submit -s 5".
Here is a list of potential extensions to get you started thinking
about some of the possibilities. This list is certainly not
exhaustive, and we welcome proposals for extensions not on the list.
The extensions on the list have varying scope, so you may want to
combine smaller extensions or propose a subset of a larger extension.
- Improved record management: variable-length records,
records spanning page boundaries, records larger than the page size,
records containing structured objects.
- Binary large objects: "blobs," sometimes called "long
fields," are arbitrarily large, untyped values for attributes (e.g.,
images, postscript files, etc.). Blobs generally are stored separately
from the data records, are referenced by appropriate pointers, and are
displayed by calling appropriate procedures.
- Text objects: similar to blobs except objects contain
text, could be text-indexd.
- Record clustering: attempting to colocate records that
have the same values for one or more attributes.
- Table partioning: partition tables horizontally or
vertically for more efficient access
- Storing relations in indexes: storing relations in IX
index files instead of RM record files.
- Sorted records: maintaining records in sorted order from
the time a relation is created (for a given attribute or set of
attributes), sorting records on demand, using sort order to improve
query execution.
- Improved indexing: multi-column indexes, additional
indexing methods such as linear hashing, extendible hashing, R-trees,
bitmap indexes.
- Flexible buffer management: alternate page-replacement
policies, buffer space "chunking"
- Better join algorithms: nested-block join, sort-merge join,
hash-based join.
- Join clustering: storing two (or more) relations
together in one file in join order.
- Additional types of joins: semijoins, outerjoins,
antijoins.
- Auxiliary structures for joins: join indexes,
pointer-based joins.
- Query optimization: cost-based selection of query plan
using statistics; statistics maintained on-the-fly or recomputed on
demand.
- More of SQL: order-by, distinct,
group-by and aggregates, or, not,
subqueries, union, intersect, except,
like. (Look at a SQL manual and pick features that interest
you.)
- OLAP support: Star schemas, bitmap indexes and joins
- XML support: XML storage, XML queries, XML to transmit
requests and/or results
- Concurrency control: locking, "begin transaction"
command, "commit transaction" command.
- Recovery: logging, "begin transaction" command,
"commit transaction" command, rollback command.
- Security: authentication, authorization management
based on user privileges, encryption.
- Compression: schemes for reducing storage space for
records, indexes, metadata.
- Versioning: storing and accessing multiple versions of
each relation or database.
- Advanced features: external function calls from queries,
stored procedures, views (virtual or materialized), integrity
constraints, triggers, temporal support
- Networking: client-server model, distributed RedBase,
RedBase as a Web service.
- Application interface: custom C++ interface or standard
such as ODBC or JDBC.
- User interface: friendlier and/or more sophisticated
command interface, graphical user interface, HTML or Java-based
interface.
- System visualization: Query plan visualization and
exploration, query execution visualizer