Gio's Movie files: DOC
This the Master file on [Varese]/MyDocs/mov/xml/doc.html; copyright 1996, 1997, 1998, 2004 by Gio Wiederhold. Free use is permitted with identification of the source.
Loaded into DB web files 25 September 1997. Updated November 1999, June 2004.
Fixes to XML linkage made Sept.2001, May 2003.
The initial objective was as to provide test data for students, to allow non-trivial exercises. There are fields suitable for complex joins, outerjoins, temporal joins, and recursion. It has also been used to investigate object-structured database technologies.
This is a HTML version of the description of the movies database. It was based originally on the description from the Kamens and Wiederhold temporal paper. It also contains literature references in appendix A.
The HTML files are no longer being updated. Starting Nov.1999 and completed in 2004 the same and more information is available in XML form. The working documentation for the full XML conversion is dtd.html, with links to the current XML files and the documents describing the XML files.
The XML versions may undergo further restructuring to explore the most effective way to present XML and eventually RDF information.
While the description below refers to the original SQL files, the TeX version, and the subsequent HTML version, the meaning of the fields has remained substantially the same.
The database was used originally in implementing the temporal queries in SQL is Wiederhold's movies database (see [21]) and for a number of other projects, both in Stanford classes and by research at the Un.of Maryland and others. The database allows testing of theories about an implementation. A `remake' file allows testing of recursive procedures. Since it contains "real data", it allows checking of results for semantic as well as syntactic correctness. Finally, the tables in the database consist of both journal and history relations (see 21, section 3), allowing us to test all of the different elements of an implementation.
The Overview section (2) below contains a summary of all the files and the URL pointers to them, or their parts or sections; see Section 3 for a full schema description.
A number of the fields are unsuitable for relational implementation, because they contain sets of values. These are best used in an object-capable implementation. The schema describes when such sets are to be expected. Encodings for several of the fields are listed in Section 4.
At the end of this document, in Appendix B., are some conversion rules for converting the HTML format to fields for relational or object databases.
The central file (MAIN) is a list of movies, each with a unique identifier. These identifiers may change in successive versions. The actors (CAST) for those movies are listed with their roles in a distinct file. More information about individual actors (ACTORS) is in a third file. All directors in MAIN are listed in a fourth file (PEOPLE), with a number of important producers, writers, and cinematographers. A fifth file (REMAKES) links movies that were copied to a substantial extent from each other. The sixth file (STUDIOS) provides some information about studios show in MAIN. This documentation file provides supplementary information, and is an essential part of the database. Some images are also available, but not on-line now. There are many cross-linking names throughout the files, so many in fact that web browsers have choked when trying to make them live. Currently we simply try to be careful in naming films and people consistently. More detail follows below. Counts have been made using emacs `count-matches'.
The main movies file is main.html. It
contains 11435 `tr/td' table row entries (July 1999). It is complete for
Hitchcock movies and TV shows, and has most films that can be related by topic,
actors, director, history, and such to those films. It has also many films
corresponding to a variety of interests of mine, former students etc., with an
emphasis on historical value. MAIN is the "central" relation in the
database; it is a journal relation that contains information about the movies
themselves. The relation contains information about movies such as their
titles, type, directors, and producers, as well as their year of release.
There is also a subset file of 82 Hitchcock full length
Movies.
The file actors.html has
6813 `tr/td' table row entries (July.1999) for many of the actors appearing in
CASTS.
Also First part
and Second part of
actors.html, not maintained.
The key of the relation is "stagename", and there are temporal
intervals indicating the dates that the actor worked and the actor's lifetime.
Other information in this relation is the actor's real name, background, and
the type of roles he/she typically plays. References to images are kept here
too.
in file people.html The
file lists all directors, as well as some other movie people, as producers and
cinematographers (A total of 3290 `tr/td' table row entries, 3011 `@' directors
as of July 1999).
The directors table is similar to the actors table in that it contains temporal
intervals for when the director worked and when he/she lived. The key of the
relation is the field "name", which is the name under which the
director directed. Director's key names do not contain any blanks. Typically
the last name is used, when needed prefaced by an initial. A secondary unique
key is defined for each director, up to three letters, based on the initial
letters of the first, middle, and last names. This key will provide HTML HREF
linkages among many of the files. As with the actors table, this table also
includes the real name of the director among its fields ("lastname"
and "firstname"). It also contains important producers,
cinematographers, musicians and composers, etc.
in file studios.html are
important studios only (203 `tr/td' entries, sparse information).
The key of the studios relation is the name of the studio. The temporal
information that is included is an interval indicating the years the studio was
(or is) in operation, represented by the fields "startdate" and
"enddate". This is a history relation.
in file casts.html This is
a large (too large?) file documenting who acted in what role in which movie.
(46 009 tr/td entries, only partial for movies and roletypes, July 1999). Casts
is an association relation, linking actors with movies. The key of the relation
is the catenation of the two fields "film_id" and "actor";
no temporal information is included in this relation.
This file was too big for Netscape in 1996, so that also five working subsets
were made available, however these are not kept up-to-date.
The four parts are still large files (6000-15000 entries; castsCH includes Hitchcock films
in file remakes.html
(1278 `tr/td' entries in July 1999).
This table (which is not extensively used in the temporal DB paper) gives
information about movies that are remakes of other movies. It is very useful to
test recursion in databases.
This file has been superseded. All entries are now in the
main file, as Alt(T:...) in the notes field. synonyms.html,
contained 379 entries in Sep 1997) This list relates to the MOVIES in
main.html.
Some movies are known by alternate titles, and can be accessed indirectly via
this file.
in file casts.html QUOTES. A few (26) memorable quotes from movies are listed in quotes.html.|1.
in file actors.html. Types of awards and the awarding agencies are in awtypes.html.
is no longer a distinct file Awards received for special occasions are listed with individual entries in the files for ACTORS (actors.html).or MOVIE PEOPLE (people.html). Regular awards associated with a particular movies are given in MOVIES (main.html) and with a particular performance are listed in CASTS (casts.html).
Books that provided material for this database are listed within this documentation file as Appendix A.
Codes for countries and origins are listed within this documentation file as section 4.3: doc.html GEO.
Codes for movie categories are listed within this documentation file as Section 4.4: doc.html CATS.
Codes for color processes used for movies are listed within this documentation file as Section 4.5: doc.html COLS.
Codes that specify role-types for actors
are listed in the preamble for casts.html ROLES.
Codes that identify subfields in various files are listed within this documentation file as Section 4.2: doc.html FIELDS.
Lists the award types used in MAIN, ACTORS, and PEOPLE, with the organizations who award them, and the span of years they were awarded.
there is a small collection of .tiff files for actors and directors. They are kept individually in an images subdirectory.
There are about a dozen icons to be used to identify subfiles. Some of them come from the New Yorker Magazine Jan.1993. There are kept individually in an icons subdirectory.
Here we give a detailed description of the schema of the
movies database, which is used for all examples in this paper and was used to
implement the temporal SQL additions. General descriptions are given in Section
2, above.
This file is being updated to desctribe the HTML version. Where updates were
made, the old material is in curly {brackets}.
Col-Name = Description
. There is a distinct table for each director (Hitchcock has multiple tables,
one for early silent, one for British, one for American, and one for TV
movies).
The tables are broken up by year of first known film by the directors. There
are some break and header records for each year.
Each director table has two types of records:
film_id =
title =
year =
director =
producers=
studios =
prc =
cat =
awards =
lc =
notes =
There is one record for each actor listed, but not all
actors listed in CAST are documented. There are also break and header records
for each letter of the alphabet.
Col-Name = Description
stagenm =
dowstrt =
dowend =
birthnm =
firstnm =
gender =
dob =
dod =
type =
origin =
photo =
notes =
Directors are the major subset of the general people.html
table. Other entries are significant producers, writers, art directors and some
authors. Being a director is indicated in the Pcode field, and has some effect
on other fields. There are also break and header records for each letter of the
alphabet.
Col-Name = Description
id-name =
Pcode =
Did =
yearstart=
yearend=
lastnm =
firstnm =
dob =
dod =
backgrd =
notes =
Col-Name = Description ----------- -------------
name =
company =
city =
country =
fddate =
enddate =
founder =
successor =
notes =
The large CAST file is broken up into sections by initial letter(s)
of the directors identifying code. In each section will be a number of
directors, ordered by code. There is a distinct table for each director.
Col-Name = Description
Each director table has two types of entries
film_id =
title =
actor =
roletype=
role =
awards =
notes =
Col-Name = Description
film_id =
title =
year =
part =
wasfilm =
wastitle=
wasyear =
This file has been superseded. All entries are now in the main file, as Alt(T:...) in the notes field. Col-Name = Description
film-id =
s-title =
s-country =
p-country =
p-title =
Col-Name = Description ----------- -----------
film-id =
title =
speaker =
role =
listenr =
quote =
Col-Name = Description ----------- -----------
award =
agency =
place =
Col-Name = Description ----------- -----------
recepnt =
award =
year =
reason =
notes =
Col-Name = Description ----------- -----------
no. =
author =
title =
pub-inf =
Col-Name = Description ----------- -----------
code =
. See 4.1 for the encoding used. country-name =
c-adjective =
Col-Name = Description
ctcode = Four-letter code
Movie categories |
||||||||
code |
category |
| |
code |
category |
| |
code |
category |
| |
Ctxx |
uncategorized |
| |
|
|
|
|
|
|
Actn |
violence |
| |
Advt |
adventure |
| |
AvGa |
Avant Garde |
| |
Camp |
now - camp |
| |
Cart |
cartoon |
| |
CnR |
Cops and Robbers |
| |
Comd |
comedy |
| |
|
|
|
|
|
|
Disa |
Disaster |
| |
Docu |
documentary |
| |
Dram |
drama |
| |
Epic |
epic |
| |
Faml |
family |
| |
Hist |
history |
| |
Horr |
horror |
| |
Musc |
musical |
| |
Myst |
mystery |
| |
Noir |
black |
| |
Porn |
pornography |
| |
Romt |
romantic |
| |
ScFi |
science fiction |
| |
Surl |
sureal |
| |
Susp |
thriller |
| |
West |
western |
| |
|
|
|
|
|
|
Col-Name = Description ----------- -----------
color-code =
full =
· specific color processes
They are listed in the preamble for movies.macros.
Col-Name = Description ----------- -----------
role-codes =
Col-Name = Description ----------- -----------
codes =
A fair amount of the information is encoded for consistency
of reference. Directors' names are always treated as codes, and many other
movie people as well. Names with out spaces are codes, and can be found in the
PEOPLE relation. Actors names are treated as codes as well, although here first
names have been retained. Many actors can be found in the ACTORS relation.
All movies have been assigned a code by catenating a director's identifier,
found in the PEOPLE relation with sequence digits.
Several code tables appear below, other used are
These codes are used in certain filed to further identify the contents.
Check this, much changed when moving to HTML.
code |
definition |
| |
||
T: |
film title |
| |
||
T2: |
redefinition of title in main.html SYNS. |
| |
||
T3: |
title used for locale file in MAIN |
| |
||
T4: |
title used for License plate list in MAIN |
| |
||
T5: |
title |
| |
||
T6: |
title used in casts.html SAYINGS. |
| |
||
T6: |
title used in quotes.html. |
| |
||
TS: |
not sure of actor spelling |
used in CASTS |
obsolete |
| |
TZ: |
title from Movies-dir.html |
| |
|
|
TZ: |
not sure if actor in this film |
used in CASTS |
| |
|
P: |
producer in PEOPLE |
|
| |
|
PN: |
producer full name |
not yet classified |
| |
|
PU: |
unknown producer |
not yet classified |
| |
|
St: |
listed studio |
not yet consistent |
| |
|
SN: |
studio name |
| |
|
|
SU: |
studio name unknown |
| |
|
|
SL: |
country or city of studio |
| |
|
|
SD: |
distributor |
old codes: Dtr, Ds, Dis{\Dtr |
| |
|
R: |
role |
used in CASTS |
| |
|
RZ: |
role uncertain |
used in CASTS |
| |
|
RU: |
role unknown |
used in CASTS |
| |
|
RN: |
only name in role |
used in CASTS |
| |
|
RS: |
spelling of actor's name unsure |
used in CASTS |
| |
|
D: |
| |
|
|
|
DN: |
director with full name, may not be in PEOPLE file |
was \DiF |
| |
|
DU: |
| |
|
|
|
Ancillary information, and social and professional
relationships is frequently given as sub-fields in notes. Below are the
prefixes used for such variable information. Some refer to candidate entries in
other files. Many occur in the MAIN file, in the final NOTES field, others in
the PEOPLE and ACTORS files. There should be only one sub- field of any type in
the notes field, but they may have contain multiple entries If the contents
references entries in a file, it is listed, but such a reference is only assured
if coded with a colon ( : ). Some of these designations are repeated with the
files themselves.
Code |
Contents |
Ocurrs in |
References what |
| |
Alias |
other professional names |
PEOPLE, ACTORS |
none |
| |
Also |
other professions |
MAIN, ACTORS |
PEOPLE |
| |
Alt |
alternative title |
MAIN (SYNONYMS) |
none |
| |
B |
Book Author |
MAIN, ACTORS |
PEOPLE |
| |
BS |
was Brother or Sister of, use Si |
PEOPLE, ACTORS |
PEOPLE |
| |
C |
Cinematographer |
MAIN, PEOPLE, ACTORS |
PEOPLE |
| |
CoD |
CoDirector |
MAIN |
PEOPLE |
| |
Cost |
Cost to make film [M/K] |
MAIN |
none |
| |
Ch |
child of |
PEOPLE ACTORS |
PEOPLE, ACTORS |
| |
Chor |
choreographer |
MAIN, ACTORS |
PEOPLE |
| |
Dtr |
distributor |
MAIN |
STUDIO |
| |
Er |
possible error, verify |
all |
none |
| |
Fd |
Founded: |
STUDIOS |
PEOPLE , ACTORS |
| |
Fdr |
Founder of: |
PEOPLE , ACTORS |
STUDIOS |
| |
Gn |
F |
PEOPLE |
used for Female producer or director |
| |
Inc |
Income from film [M/K] |
MAIN |
none |
| |
Inf |
influenced |
PEOPLE, ACTORS |
PEOPLE , ACTORS |
| |
Inb |
influenced by |
PEOPLE, ACTORS |
PEOPLE, ACTORS |
| |
Lw |
Lived with |
PEOPLE, ACTORS |
PEOPLE, ACTORS |
| |
M |
composer |
MAIN, PEOPLE, ACTORS |
PEOPLE |
| |
Mt |
Married to |
PEOPLE, ACTORS |
PEOPLE, ACTORS |
| |
Nt |
note |
all |
none |
| |
P |
playwright |
MAIN, ACTORS |
PEOPLE |
| |
Pt |
parent of |
PEOPLE, ACTORS |
PEOPLE, ACTORS |
| |
Ph |
unusual physical characteristics, |
ACTORS |
none |
| |
R |
ARt director, now V |
MAIN, PEOPLE, ACTORS |
PEOPLE |
| |
Seen |
[dd[MMM]]yy |
MAIN |
none |
| |
St |
studio |
MAIN, ACTORS |
PEOPLE |
| |
Si |
|
PEOPLE, ACTORS |
PEOPLE, ACTORS |
| |
Ty |
typical style |
PEOPLE, ACTORS |
Table 4.4 |
| |
V |
Visual art director |
MAIN, ACTORS, PEOPLE |
PEOPLE |
| |
VT |
have video tape (number) |
MAIN |
none |
| |
W |
writer |
MAIN, PEOPLE, ACTORS |
PEOPLE |
| |
Ww |
worked with |
PEOPLE, ACTORS |
PEOPLE, ACTORS |
| |
Codes for countries and origins are listed below. It is still incomplete.
Common countries for movie making are listed first, followed by other countries in alphabetical order |
|||||||
. code |
country |
adjective |
| |
code |
country |
adjective |
| |
Am |
USA |
American |
| |
|
|
|
|
Br |
not used |
British |
| |
GB |
Great Britain |
not used |
| |
Fr |
France |
French |
| |
|
|
|
|
Ge |
Germany |
German |
| |
|
|
|
|
It |
Italy |
Italian |
| |
|
|
|
|
Ja |
Japan |
Japanese |
| |
|
|
|
|
Alphabetical by
code |
| |
|
|
|
|
||
Ar |
Argentinia |
Argentine |
| |
|
|
|
|
Au |
Australia |
Australian |
| |
|
|
|
|
Be |
Belgium |
Belgian |
| |
|
|
|
|
Bz |
Brazil |
Brazilian |
| |
|
|
|
|
Ca |
Canada |
Canadian |
| |
|
|
|
|
Ch |
China, PRC |
Chinese |
| |
|
|
|
|
Cz |
Czechoslovakia |
Czech |
| |
|
|
|
|
Da |
Denmark |
Danish |
| |
|
|
|
|
Gr |
Greece |
Greek |
| |
|
|
|
|
Du |
Holland |
Dutch |
| |
|
|
|
|
Hu |
Hungary |
Hungarian |
| |
|
|
|
|
In |
India |
Indian |
| |
|
|
|
|
Ir |
Ireland |
Irish |
| |
|
|
|
|
Me |
Mexico |
Mexican |
| |
|
|
|
|
Os |
Austria |
Austrian |
| |
|
|
|
|
Pe |
Peru |
Peruvian |
| |
|
|
|
|
Ru |
USSR, Russia |
Russian |
| |
|
|
|
|
Sp |
Spain |
Spanish |
| |
|
|
|
|
SA |
South-Africa |
South-African |
| |
|
|
|
|
Yu |
Yugoslavia |
Yugoslav |
| |
|
|
|
|
Zw |
Switzerland |
Swiss |
| |
|
|
|
|
Codes for movie categories are listed below. Mainly used in MAIN,
also content of Type fields. Multiple entries are possible.
code |
category |
| |
Susp |
thriller |
| |
CnR |
cops and robbers |
| |
Dram |
drama |
| |
West |
western |
| |
Myst |
mystery |
| |
|
|
|
S.F. |
science fiction |
| |
Advt |
adventure |
| |
Horr |
horror |
| |
Romt |
romantic |
| |
Comd |
comedy |
| |
Musc |
musical |
| |
Docu |
documentary |
| |
Porn |
pornography, including soft |
| |
Noir |
black |
| |
BioP |
biographical Picture |
| |
TV |
TV show |
| |
TVs |
TV series |
| |
TVm |
TV miniseries |
| |
Codes for color processes used for movies in MAIN are listed
below.
code |
full name |
description |
| |
|
prc |
unknown |
|
| |
|
col |
color |
color film, common after 1955 |
| |
|
bnw |
black-and-white |
b-w film common before 1945 |
| |
|
sbw |
silent |
silent black-and-white film |
| |
|
cld |
colored |
black-and-white film recolored |
| |
|
Cart |
cartoon |
Cartoons are normally colored |
| |
|
Tcol |
Technicolor |
high quality color |
| |
|
Ecol |
Eastmancolor |
color by Kodak |
N\t(unstable) |
| |
Wcol |
Warnercolor |
|
| |
|
Mcol |
Metrocolor |
Color by MGM |
| |
|
Acol |
Anscocolor |
color by Kodak? |
| |
|
Agcol |
Agfacolor |
|
| |
|
Fcol |
Fujicolor |
|
| |
|
DeLuxe |
DeLuxe |
low cost color |
| |
|
DuArt |
DuArt |
color |
| |
|
Movielab |
MovieLab |
color |
| |
|
CS |
Cinemascope |
widescreen, mostly color |
| |
|
Trama |
Technirama |
widescreen color |
| |
|
Pan |
PanaVision |
|
| |
|
TV |
film made for TV |
various processes |
| |
|
Vst |
Vistavision |
|
| |
|
Books, etc.
Electronic material
Books about Actors
BOOKS I have for Movie Stories:
This section refers to the original HTML files. The notes are still being developed.
To convert the source files from HTML format to another type of database:
(we use [] to denote HTML `french' brackets.)
If both `;' and `,' appear in a field, then the `;' typically distinguishes a major group relative to the `,' value separator, as [td] island, South Pacific; court, SF, CA[td].