CS145 Lecture Notes (1) -- Introductory Material
This lecture material complements the assigned readings: Chapter 1
of the textbook, and the Introduction to "SQL for Web Nerds" by Philip
Greenspun (linked from the Course Schedule). Please do the
readings.
Database Management System (DBMS):
Provides
efficient, convenient, and safe multi-user
storage of and access to massive amounts of persistent
data
Most familiar use: many Web sites rely heavily on DBMS's. (And you
will build one!)
Examples: (solicit from class)
Non-Web examples of DBMS's: (solicit from class)
Example: Online Bookseller
Data = information on books (including categories, bestsellers, etc.),
customers, pending orders, order histories, trends and preferences, etc.
- Massive: hundreds of gigabytes at a minimum for medium-size
bookseller, more if keep all order histories over all time, even more
if keep clickstream logs, even more if keep images of book covers and
sample pages
=> Far too big for memory
- Persistent: data outlives programs that operate on it
- Multi-user: many people/programs accessing same database, or
even same data, simultaneously
=> Need careful controls
Multi-user example and discussion:
Jane and John both have ID number for bookseller gift certificate
(credit) of $200 they got as a wedding gift.
Jane @ her office: orders "Database Systems: The Complete Book" ($95)
prompt user for credit ID;
get credit from database;
if credit >= 95 then
credit := credit - 95;
issue order to mail book;
if OK then put new credit into database (else handle error);
else print "sorry"
John @ his office: orders "A First Course in Database Systems" ($75)
prompt user for credit ID;
get credit from database;
if credit >= 75 then
credit := credit - 75;
issue order to mail book;
if OK then put new credit into database (else handle error);
else print "sorry"
Initial credit = $200
Ending credit = ???
Appears similar to concurrent programming problems (synchronization,
semaphores, etc.)
BUT: data not main-memory variables
Appears similar to file system concurrent access
BUT: want to control at smaller granularity
Also database may be distributed, replicated, etc.
- Safe: (1) From system failures (2) From malicious users
- Convenient: Simple commands to -- find a specific book,
list all books in a certain category and price range, generate an
order history, produce sales figures grouped by state, etc.
=> Also unpredicted queries should be easy
- Efficient: Don't search all files in order to -- get
price of one book, get all customers from northern California, get
bestselling books from last week
Massive data => DBMS's carefully tuned for performance
A DBMS is a Software System
- Buy or download, install, set up for particular application
- Available for PC's, workstations, mainframes, supercomputers
- Frequently used in conjunction with application
server or other middleware, or exposed as a web service
- Basic software infrastructure: (figure)
Major DBMS Products and Freeware
(Fewer vendors every year...)
- Oracle
- IBM: DB2, Informix
- Microsoft: SQL Server, Access
- Sybase
- MySQL
- Postgres
- Derby
All are "relational" (or "object-relational") database systems at
their core.
All commercial vendors also provide XML support, either
built-in or built-on-top.
Terminology and Basic Ideas
Data model
- Describes conceptual structuring of data stored in database
- Example: data model is set of records. E.g., records
might each have student-ID, name, address, courses, photo
- Example: data is XML document. E.g., document might
contain list of books with ISBN numbers as ID's, titles and author
names as subelements
- Example: data is graph. E.g., nodes might represent
cities, edges represent airline routes
Schema versus data
- Schema describes how data is to be structured -- defined at set-up
time, rarely changes (part of the "metadata")
- Data is actual "instance" of database, may change rapidly
- Compare to types and variables in programming languages
Data Definition Language (DDL)
- Commands for setting up schema of database
(add box to basic software infrastructure)
- Process of designing schema can be complex, may use design methodology
and/or tool
(add box to basic software infrastructure)
Data Manipulation Language (DML)
- Commands to manipulate data in database:
RETRIEVE, INSERT, DELETE, MODIFY
- Also called "query language"
People
- DBMS implementor: builds system (245, 346)
- Database designer: establishes schema (145)
- Database administrator: loads data, keeps whole thing running (145)
- Database user: queries/modifies data (145)
Less Traditional Database Applications
- Stock monitoring, air traffic, telecom call records, network and
system monitoring: real-time data streams, historical data and
queries, "active" database
- Distributed, heterogeneous databases: collaborative design,
medical information systems, data warehousing, comparison shopping,
"querying the Web"
- Scientific data -- e.g., sensors, experimental result data,
satellite feeds: terabytes or much more, uncertain/approximate data,
specialized algorithms and query languages
Towards the end of the course we will learn about systems being
developed to address many of these applications.