January 1994 Report No. STAN-CS-TR-94- 1500 1993 Publications Summary for the Stanford Database Group by Marianne Siroker Department of Computer Science Stanford University Stanford, California 94305 !2~ZEo "~ 1993 Publications Summary for tbe Stanford Database Group Department of Computer Science Stanford University Stanford, CA 94305-2140 USA Contact: Marianne Siroker email: siroker@cs.stanford.edu phone: (415) 723-0872 ||CO~ENT: sub-ittod ror publication, Doco~bor 199 ~ Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases Shailesh Agarwall, Anhur M. Keller2, Gio Wiederhold3, and Krishna Saraswa* A hctrart In lhis work we address the problem of dealing with data inconsislencies while inlegraling dala sels derived from multiplc aulonomous relalional databases. The fundamenlal assumption in the classical rela~ional model is that data is consistent and hence no support is provided for dealing with inconsistent data. Due to this limitation of the classical relational model, the semantics for detecting, representing, and manipulating inconsislenl data have to be explicitly g Por~ions of Ihis research were funded by the Advanced Rcsearch Projecls Agency (N00014-90-J- 4016) Ihrough Ihe Texas Inslrumenls MMST projecl he Semiconduclor Research Corporalion (SRC-90- MC- 106) and Ihe National Science Foundalion (IRI- 9007753-A2). I Persislence Sohware 1700 Soulh Amphlat Boulevard Suile 250 San Mateo CA 94402. E- mail: sagarwal@persislence.com. This work was done while the au~hor was a~ S~anford University. All conlmunica~ions regarding ~he paper should be nddres.~ed lo him a~ the above address. 2 Depanmen~ of Compu~er Science. Slanford Universily S~anford CA 94305-2140. ~ I~epar~men~ of Compu~er Science S~anford Universi~y Slanford CA 94305-2140. The aulhor is curren~ly a~ Advanced Research ProJecls Agency 17()1 Nonh Fairfax Dnve Room 739 Arling~on VA 22 2(~- - I 7 1 4. 4 I)ep~nmenl of Eleclncal Engineenng Stanrord ver.~lly .SI~nrord. CA 91305. encoded in the applicalions by the application developer. In this paper, we propose the flexiblc rclational model, which e~tends the classical relational modcl by providing suppon for inconsistent data. We present a flexible relation algebra, which provides semantics for database operations in the presence of polentially inconsistent data. Finally, we discuss issues raised for query optimization when the data may be inconsistent. 1. Introduction Advances in compu~er nelworlclng technology and the availability of cconomical computing hardware have led to a proliferation of autonomous databases connected by high speed communication networks. As a result of this greatly increased access to remote databases, a growing number of database applications need to jointly manipulate data located in multiple databases lLitwin89, Breitbart90]. A number of differenl terms such as distributed databases, multidatabases, federated databases, and interoperable databases have been used in the literature to describe sets of multiple databases which are jointly manipulated ICeri84, Li~win90, Sheth90, Brigh(921. For (he purposes of (his paper, Ihe CO~ENT: Stan~ord Univ~r~ity Co~put~r Sci~nc~ D~part-~nt T~chnical | Roport Stan-CS-93-1496 (Nov. 1~93), ~ubmitt~d 12/93 ~or con~erence publication Using Delta Relations to Optimize Condition Evaluation in Active Databases~ Elena Barali~ Jennifer Widom Departmcot of Compute~ Saence St oford Uni er it~ St nford, CA ~30~21~0 {barali-,~idom}Ocs d~oford edu ~b-tract We gi~re a melbod for impro~rin~ the efficienc~ of condition e--lu- ion durin~ rule proces i;o~ in acti~e datab~ e ~y~tem~ Tbe melhod derin~, from ~ mle conditioo, t~o optimi ed condition- tbat can be u ed in place of tbe original condition ~hen ~ pre iou- ùalue (true or f l e) of the ori~inal condition i~ ~no-~n Toe deri~ed cooditio0 re more eflicieot th o the ori~ioal cooditioo becau e the~ replace refereoce~ to entire d~t~e rel~tionr b~ refereoce~ to ~c~ rel ~unu, ~rbicb trpicallr are mucb ~maller Delta rel~tion~ re accer ible to rule cooditioo~ in Imo t all current acti~e databa e ~tem~, ma}io~ thi- optimi atioo bro dl~ ~pplic ble We I o ~pecif~ an attribute ~rammar ba ed implementatioo of our method 1 Introduction Active database systems allow wers to specify c~cnt-condition-action rules that are proce~-ed ùu- tomatically by the databa e syctem in re pon e to data manipulation by u~er- and application~ A rule's ~vcnt specifies ùrhat cause~ the rule to become triglsered; trpical triU~erin~ e-ents are data modification or data retrie~ral A rule's condition i~ a further qualification of a triB~ered rule, u ually e~pressed as a predicate or query o~rer the databa e A rule's action i~ performed ~rhen the rule i~ triggered and it~ condition is true; actions wually are sequences of arbitrary databa e command~ Mo~t current acti~re databa e ~y~tem~ (both re earch prototype~ nd commercial ~yctem~) u e thi~ rule paradi~ ee [HW93] Rule proceuinls in active databa e cyCtemJ u ual~y con i~t~ of an iterative cycle in ù-hich (1) a trig~ered rule i~ ~elected; (2) the rule'~ condition ic en~luated; (3) if the condition i~ true the rule'~ action i- e~ecuted In thi~ paper ~re Kive a method for optimitin~ ~tep (2) in thi~ cycle for acti~re databa~e~ that we the relational model l Our method i~ ba ed on the follo-ring t-ro a~umptiow Thi~ ~or~ ~rn~ p rti 11~ pcrfiorm~d ~hile tbe ~uthor~ ~ere ~t the IBM Almnden He e rch Center, S n Jo e, CA At S~ n~ord thiJ ror~ ~ n- ~upported b~ equipment ~r~nt~ from Di6itJ Equipment Corpor-tion ~nd IBM Corpor~tion B~r~ ' perm-nent ddre - i~ Dip rtimento di Autom~tic~ e Inform-tic~, Politecnico di Torino, Cor o Duc- de61i Abru~i, 2~ - 10129 Torino, It~l~ IAd-ptin6 the m~thod to ncti~e OODB'~ i- pl~ned for rurth~r ~rort; ee Section CO~ENT: Short v-rcion to App Ar IFIP Con~-ronc- on ~ppIic~tion~ ~ | and Di~tribut-d Co-puting, April 1~. Long v~r~ion ~ubaittod ~or journal publication. Replicated Data Management in Mobile Environments: Anything New Under the Sun? Daniel Barbara Hector Garcia-Molina Matsu-hita In~ormation Techoology Laboratory Slanford UniYer ity 2 Research Way, 3rd FloorDepartment of Computer Science Princeton, NJ 085o USAStanford, CA 9U05 USA danielQMITL.Re earch.Panasonic.COU heclorQc-.-t nford.edu Octobcr 22. 1993 b t~ The mobile ~irele - compu-in~ ~n~ir~men of ~e fu~ure ~ill coo~ e ~lumb~ of lo~ po~red p~lm~op m~chioe~ Replic--ioo ~ill be ~n e~o~l toc~ique in tlli- en~i~oomeo~, pro- y d--~ il bili~y o th~ ù~ em ID ~ rnobile en~ir~m~ it i- impor~t o h~e ù~ut replic~t ~d d~ r~n~men~ ù16ori-hm ù~ ùllo~ ror ilU~DC~ COp~ ~0 mi~r~-e from o~e ik to ~o~ber or ror ne r copier o be ~en~r~d In ùhi p per ~ ~o~ tl~ ~cll d~mie ~4Ori~ cu~ b~ ob~ ed ~imply b~y le--i~ ruu~ upd-~e ~e ~or~ ~h - ~ie~ ~it~ boldi~ copi~ 'lbu~ ~e ~u~ th t oo fuDd meDt~llr ~ ù4Ori~hm ùre noal d o cope ri D mobili-r. Ho~r, e~i-i~6 ~4ori-hm- m~y h~e o be ~u~ for ~ mobile en~ironmen-, ~d ~e di eu~ ~h.~ ~bi~ m y en- il J~ ~n illu r~ion, ~e pr ent ~ ion of ~h~ prim rr cop~ ~4Ori~hm, Prim rr Br Ro~, ~h~t i~ ~11 uited for mi6r~1i~ cop~ ~. K~y~ord~ Di- ribu-ed D~ B~, replic-~ioo, mobUi-r, ~ibbili~y I Introduction The mobile riKleu computing environment of the future lIB921 will contain lar~e numberr of low po ~ered palmtop machine~, querying databa~ over ~ iKle ~ channeb The unit~ rill often be di~connected due to power limitation~, io cce- ible communicatioo chaonel~, or a~ unit~ mo~e bet~een differcnt cell~ The ability to replicate data object~ in ruch o en~ironrnent ~ill be e~ ential Object copie~ are the l~ey to high data a~ailability ~hen a unit ir di~connected it c o continue to proceu object~ stored locally At the ame time, replicated data cao impro~e perforrnance a copy at a nearby or ler- conge~ted rite can be acce ed Thu~, ~re expect copie~ to be comrnon both ~No e to ~e ~ef~ ù A mucb ~lor er ~O~I of tlli- p per rill ppeur iu ~be IFIP Co~re~ce o~ Appic~tio~ i~ P rullel u a D_ ributed Comp~ti-~, C r-c--, Vene ~d~, Ap il 199 T~ Iimi ed o 10 pye, u d COU~UIU u~d ~0% of ~e ro~ of tl~i~ rnuD~crip I~e~ iell~ coude~ ed ù~e~ of ~e lird follr ~octioo- of m~crip~. ù TD~ P Per i~ iDtcDded to be ù ur e~ e p-per of p e~io u ~or~ i ~ tbe re~ ~d i - pplic~bi~ o mobili ~Dnro~mc~ . TDe ~ut~or~ und~ uDd h- th~ VLDB J >ur~l ~Icome contribetio u of l~ re ~a ||CO~ENT: Submitt~d ~or con~r~nc~ publication. _ Constraint Management in Loosely Coupled Distributed Databases~ Sudar~han S Chawathet Hector Garcia-Molina Jennifer Widom Dep~rtmeDt of Computer Science St~ford Uni~er it~ Stallford, CA ~30~21~0 ~cha~,hector,~idom}Oc- -t~ford edo ~bctraet We pro ide a frame~or~ for man~gil~g integrit~ co~tr~t~ tll-t rp~ multiple d~t~ff in loo el~ coupled, heterogeDeou~ en~ironment~ Our frYne~o~ e~abk~ the formal de cription of ( I ) inlerface~ prmrided by ~ dat~ba e for the d t~ itemJ i~ ed il~ mult~d~t~e colutr It~; (2) ~trategie~ for monitoring and maint nin~ multi-d~t~ba e co~tr~tr; (3) 6u rultee- re6 rd- ing the con~i~tenc~ of multi-d~l~ba e con-lr int~ Uci~8 our frYme~or~, it ic poe ible to rpecif~ and prm~e thal for ~ multi-d~tab e con~traint C, ~ et of interf cff for the d-l~ item~ in-ol-ed in C, and a ~trategr for monitorin8 or m intai~ g C, cert~ 6u rulteff c n be m de ~bout C'~ con~i tenc~r Our frame~orl~ incorpor~te~ the ootion of time, ~d it i- ~le~ible u~d 6eneral enough to be ~pplicable in a ride ~ariet~ of heteroReDeou- d~t ba e en ironment- 1 Introduction Integrity con traint~ in a databa e ùy~tem ~pecify conditionr that mu t hold for thc data to beconsidered conJiJtcnL For e~ample, in the well-lcno~rn emplo~ee and departmcnt databa e, n integrity constraint might specify that the avera~e employee ralary should not e~cecd a certain ~ralue; another constraint might specify that each department should contain some minimum number of employees Typically, inte~rity con traints are assumed to hold at the beBinninB of each databa e transaction and are guaranteed to hold at the end of each tran action En uring that a con traint holds at the end of a transaction can be achie~red by ~rerifying the tran action in advance, e g IH185], by checl~ineo the constraint at the end of the transaction and aborting the transaction if the con~traint doesn't hold, e K [Gre93], or by using acti~e databa e rule~ to trigKer additional actions that chec} and re tore the con~traint, e g lCW93] ùRe e rch ~pon ored b~ the Wri6ht L-bor-~or~ Acron-utic l S~tem~ C~Dkr Air Force U-kri l Comm Dd USAF und r Gr n~ Number F33~15-93-1-1339 The US Go-ernment i~ ùuthori ed to reproduce Dd di~tribuk reprint~ for Go~ernment pUrpo eJ not~ith t~ndin6 D~ cop~ri6ht not tion thereon The rie~n -Dd conclu~ion~ con- tein~d in ~hi~ document ùre tho e of the ùuthor~ nd ~hould not be inkrprekd ùr necee- riJ~ repre entin6 the offici l policie- or ~ndor ement~ ~ither e~pre-- or implied of W-i6ht L-bor-tor~ or the US Go-ernment Thi~ ~or~ ~r al~o ~upported b~ the Center for Inte6r-ted S~tem~ ùt St nford Uni-er it~ nd b~ equipment 6r-nt- from Di6it-1 Equipment Corpor-tion ~nd IBM Corpor-tion ~ Conl~c~ Author Tel (415)723~872 F~ (415)723-2588 CO~ENT: Stan~ord Univer~ity Technical Not~ Nu b~r ST~N-CS-TN-93-00 Submitt~d ~or con~r~nc~ publication. p The Efficacy of GIOSS for the Text Database Discovery Problem Stanford Uoiner itr Technical Note Number ST~N-CSTN-~2 Luis Grsvano~ Hector Garcia-Molinat Anthony Toma~ic~ Ab-tr~et The popularitr of in~ormatioo retrie~al h~ led u er~ to ~ ne~ problem firldill~ ~hich te~t databa e~ (out of tbou~and~ of c~ndidale cboice~) re the mod rele~t to ~ luer ~n-~ring a giu~n quer~ ~.ith a li~t of rele~ nt d~l~ba e~ i~ the k~ ù-~uc duc~er~ l~ The fir t part of ùhiJ paper pre ent~ a practic l method for atlac- 6 t~i- problem ba ed on e tiDnati the re ult ~iJe of a querr and ~ datab~ e The method i~ termed C1055-Clo-~ of Scrrer~ Ser~r Tbe recond part of thi~ paper e~aluate~ CIOSS oun~ four di~ferent em ntic~ to a~ er a u er'~ querie~ Real uJer~' querie~ ~ere u ed in the e~perimelltJ We bo de cribe e-er l ~rariation~ of CIOSS and compare their emc c~ In addition, ~e ~al~-e the dor4e co t of our approach to the problem 1 Introduction Information vendor~ ùuch a~ Dialo~s and Mead Data Central pro-ide content-inde~ed acce~ to mu1tiple databa e~ Dialo~ for iwtance ha~ o~rer three hundred databt~ In addition, the ad~rent of Archie, W~IS, World Wide Web, and other Internet tool~t ha~ pro~rided ea y, di~tributed acceu to many more hundreds of te~t document databa e~ Thu-, ut~ are faced ùrith the problem of finding the database~ that are relevant to their information need (the u er query) Thi~ paper pre ent~ a frameworl~ for (and analy~er a ~olution to) thi~ problem, ~rhich ~re call the tczit dat~uc ducot~ctry ptroblcm (also referred to a~ the treJOll~C ducovctr~ probkm in ome generally more comprehen i~re conte~t~) lSEKN921 and 1ODL93] provide ùunrey- of propo~ed ~olutiow to thi~ probbm The traditional information retrie~lal problem of finding document~ rele~rant to a u er query i~ ~tudied by (a) de~cribing an information retrievl~l model (con i~tin~ of a document representation, a query repret~entation, and a matching a4~orithm) and (b) e~raluatin~ the model in term~ of preci~ion and recall lTC92, SB88] Thi~ problem ha~ a ~inKle underlyinls ~emantic~ of producing all (measured by recall) and only (measured by preci~ion) the documents rele~rant to the query We e~pand on this rrameworl~ by moti~ratin~ four different ~emantic~ for the database di-co~rery problem Given a query and a set of databases, the wer may be intere~ted in (at lea t) four different JemanticJ for the query ù E~hau!ttit.e Seatrch. The wer is interested in the ~et of all database~ which contain any t~elctJant documents . St~nford univer.it~, computer scienc~ Dept, Mar6~r~t Jacl~ Hall, Sl~nford, CA 94305 Email 8r~-~o-C- ùt~n~ord ~du tSt~nford univer.it~, compu-~r sci~nc~ D~pt, Mar~ar~t J~ch H-ll, St~nford, CA 94305 E-m-il h~ctor~c- -t~ord du tPrinc~lon univ~r~ity, D~partm~nt of comput~r sci~nc~ curr~nt AddrNs St~nford univ~r it~, comput~r sci- ~nc~ D~yt, Mar6llr~t Jacl~s Hall, Stanford, CA 94305 E-mail to~a-ic~c- ~t~ord ùdu ||CO~ENT: appoar~ in Ine~rnational Con~r~nc~ on P~rall~l and Distribut~d Sy~t~a~ 93 || Centralized, Decentralized and Pipelined Algorithms to Parallelize Join~ ,~shish (lu,ota+ l~obcrl 1 T Morris' Arun ~Swami' llonesty (, Young' -~- D~partmcn~ of ('omput~r ~ci~nce, Stanford University, ~tanford, CA 9~305, IJ~A ~ IB~ Almad~n Rcsearch Center, San Jose, CA 95120-609~J, lJSA +agupta~cs stanford edu, ~trjtm,arun,young)Qalmadcn ibm com Ab~lrnct Ill parallelizing Ihe join operalion or dalabAse ~ysle~ls a pri~ary objec-ive is lo parliliorl Ihe worl~ ill a way Ihal ensures Ihal no proce~or ha~ a colllplelion ~ e ~or ils parl or Ihe jOi~l Ihal ~ar exceed~ hal ~r any oLher pmcessor While o~le can de i6n - p~ ot bi~l packi~l8 al~orilh~n Ihal achieves lhi~ al- exacllv such al6ori~ s can involve a cosl (inlto alld com~llu~licalion resources) Ihal dor~ ales Itle joill aclivi-y ilselr Tllererore Ih~ chall'n6e is locreale effeclive scalable heurislics which achieve Yoo(l resull~ wilh o~lly l~lar~inal overhead We describe Illr~e classes or al~orilllms c~n~r~l- ~z~d, ~ ntrallzcd and p~pcl~ncd l y~nl We i~l-roduce a laxollollly or join Iypes and use Ihis laxono~ny lo ~xplAill how Ille difrerenl algorilh~ls choose a poinl in h~ spe~lru~n of complexily robuslness arld overhead We presenl our resulls in Ihe rorm of "Iradeorr curves" wllich ~nalLe clear Illal very Iar6e response lime im- pro~el~lenls are oblainable rrom Illese algorilhms and al whal cosl ill resource consumplion We describe a spe~lrulll of algorilhms rangin6 from a c~nlralized al- ~orilhlll Ihal balarlces Ihe worl~ almosl exaclly (bul al a rairly high cosl) lo decenlrali~ed algorilh~ns which are lnore heurislic and achieve much Ihe same errect bul wilh l~luch lower overhead Then we describe a ~unable class or pipelined algorilhms which paru~lel- ~ically subsu~le a nunlber or Ihese melhods 'I'his paper shows Ihal Ihe proposed load-balancin6 alRorilhl~ls are robusl under all Iypes of join loads 11l sollle cases Ille alRorilhllls yield up lo Iwo onlers of lllaRllilude beller perrormance in response lime al lllar~illally higher resource cosl ( 10-30%) colnpared lo hasllill~ l'he laxollolny of join Iypes is illluilive and helprul ill cale6orizing al~orilllms 1 Introduction 'I'he joill is bolll olle of Ille l~lOSl comlllon and mosl rxpellsi~e operaliolls ill relalional dalabase sy~lelns ~II J i~ hellce a llalural call Ihlale ror paralleliza~ 'I'llere hd~ beell exlellsi~ ork oll parall~l dalaba~e ~ilrlll~ [I ~ Il 141 alld oll diil~r~ parallel joi al~r ~e ~oll~i(ler t~ or more column~ The join opera~ion can be deco~n- po ed inlo mulliple ~ubla h by par-i-ionin6 on Ihe join a Iribule(~) value~ each ~ubLa~l~ con~ Or join- in6 luple~ rrom Ihe l~ro relalioru ~hal have Lhe ame join column value i e ~al~in6 Ihe Carle~ian producl or Ihe corre~pondin6 ~ubsel~ or lupl~ rro~n eacb re- lalion The worl~ involved in any one or Ihe~ ~ub- la~ xn~ilive lo Lhe ~u or mulliplicily or value~ in Ihe operand relalion~ and Ihe correlalion belween Iheir rrequenl value~ llO 151 When Ihe join vorl~ i~ parlilioned amon6 mulliple proce~or~ in Ihe paral- lel join Ihi~ sllew in Ihe operand relaLion~ can resull in ~i6~1ificanl imbalance in Ihe ~vorl~ assi6ned lo Ihe difrerenl proc~or~ This imbalance can cau~e si6nif- icanl de6radalion in Ihe overall re pon e lime rOr Ihe join operalion and, a~ a re~ull, Ihere has been a lol of inkres~ in ~l~e v-handlinl al6orilhm~ for parallel join~ [8, 9, 16, 17] One can Iry lo handle Ihis ~l~ew by approprialely balancin6 Lhe dis~ribulion of Ihe ~ewed values an1on6 Ihe join proce~sors 13ul in cases of evere ~l~ew, no si~161e join proces~or can be aYsi6~1ed Ihe joinin6 ùub- lasl~ correspondin6 lo a sin61e value and ~pl~ ng, i ~ Ihe parlicipalion of more Ihan one processor in one of Ihese ~ubla~l~s, i~ ~lecessary Such ~plillinl is u~ually reasible bul re~ulls in exlra o~erhead For Ihe paral- lel join, Yplillin6 i9 achieved u~in6 a lechnique such as ~ragmcn~ rcpl c-tc (Ftl) 131 Thus Ihe ~llew 11andlilu problem can be Lhou6hl of as a form Or lo d hltJnc- ~ng, allhou6h Ihe use of Ibe lerm here difrers from Ihe usa6e in Ihe operalin6 syslern conLe~l since we have prlon l~lowled6e or resource co~l-umplionY and are free lo splil laYl~s We consider joi~lin~ 2 relalions, R~ and R~, u~in6 asel orp jo~n JilC~ Jl, ,J~) The equijoin is over Ihe co~n~noll allribule A which lal~es valueY from he dolllun V=~l~l,...,v~}. We oflen U8e Ihe ler~ alY~ oJ a ~Yplc lo refer lo Il1e value or Ihe join al- lril ule A Or Ihe luple Wil11oul 1088 Or gell~ralily, we as~u~1le Rl and K2 are i~ ia11y rra6l~le~l1ed horizon- lall! over ~ ~ora9t ~lcl (S): ~.S,,. . . ,S,). We as~u~le a Ille~ia~e-pa~ G ar~hileclure ill wbic11 ~1ude~ are i~l- ler~u111le( led h! a l1t l~ork Ihal lel~ e~ery pair or lu~Jff CO~ENT: ~ubmitted ror journal pub. in ~ugu~t 93. E~tend~d ab~tract ~pp~ar~ in PODS 92 (pa~ 354-367). Magic-sets Transformation in Nonrecursive Systems ~ul)mi~Led ~o Tranoaclion8 on Dalabase ~SYJt~m8 Ashish Gupta'' Inderpal Singh Mumick ( olIll~uler ~;cience DeparLmen~ AT~T Bell Labora~ories 5;lanfor(1 Univer~ily600 Mounlain Avenue ~iLanrord, ('A 94305, USA.Murray Hill, NJ 07974, USA. agupla~cs slanror-l ~dumumick~research.all~com Augu~t 1993 1 Introduction The Problem: Most existing database systems (113~'s DB2, for example) do not support recursive queries. SQL is the universal query language amongst relational databases, and the currently implemented SQL standard (SQL2, llSO90]) does not provide for recursion. We ù ill call a database system nonrecursive if its query language does not permit recursive queries. l hus, l)B2 is a nonrecursive system. I he magic-sets transformation (IBR87, U1189]) is a general purpose query optimization technique initially proposed for recursive queries in deductive databases. It has been shown that the magic-sets transformation (MST) can be extended to optimize relational SQL queries where SQL has been extended to permit recursion lMFPR90, MPR90, Mum911. I'erformance experiments have demonstrated that MS T is an invaluable optimization for nonrecursive queries IMFPR90, Mum911. It is thus imperative that MST be used to opti- mize queries in a nonrecursive system. Ilowever, there is one problem that must be solved before MST can be used in a non- recursive system. MS~l has the undesirable property that it can transform a nonrecursive query into a recursive query, as in Example 1.1 below. EXAMPLE 1.1 Consider query Q on the nonrecursive program ~ (We use a letter, such as P, to represent a program comprised of rules Pl, ~2,. . .): (Q)~ (ø) rl) t(X) - s(X, Y) & y(Y). r2) S(x~ Y) - p(X) & 9(X, Y). ùUiork wa~ ~u~rled lly an NSF grallL IRI-91-lfi64i;, all Air F~>rce Rranl AFOSR-88-02fi6, alld a granl r ILll~ rl~orali~ FILE: /pub/~upta/1993/p~-improv.p~ CO~ENT: Tochnical R~port STAN-CS-~3-1473. Improvement to the PF Algorithm Ashish(JIlpta l)~partlllelll ~,r ( olllpuler ~iellce Slallror(~ ~Jlliver~ily ~;lallrord, (~A ~43()~' 14U alSupta~c~ . ~tan~ord. ùdu InderpalSingh Mumick A'l'k'l' t~ell Lat)<)ral~rie~ 600 Moulllaill Avellue Murray Hill, NJ 07974, llSA lmi( Llar~P~r~ A~ Ab~tract The 1'~' algorilhm, di~cus~ed in [HD921, compules changes lo a malerialized view when lle ullderlying ba~e relaliolls challge. The algorilhm considers each charlged base relalioll in lurll an~ propagale~ the changes lo lhe derived relalions. We propose ways of mal~ing Ihis pr(>paf~alion proce6s efficienl by exploiling slralificalion [lJl189] of program~ alld by sloring a ld reu~illg ~ome or Ihe illforlllalioll derived by Ihe PF aJgorilhm in ~HD921. 1 Introduction The ~iew maintellallce strategy outlined by Harrison and Dietrich operates as follows (page 60 [HD9'~]). Ea~h base relation is corlsidered in turn and all changes to it are propagated to derived relations. fu repreiellts the set of updated base relations, and ~œU is the corresponding delta set. If E is a base relation that is u,odated, then ~E ~ ~œU. Rrocedure IJpdat~ begi n For each ~E ~ ~œu do begin p =pred sym(~E); Rem tuples = removals(~E); Add tuples = additions(~E); If Rem tuples ~ ~ then PF(p,Rem tuples,r~ If Add tuples 7~ ~ then PF(p,Rem tuples,add) end; end ~uydate} Tllat is, for eacll uT)dated base relation, all changes to derived relations are computed; first all deletions are computed and then all tlle insertions. The procedures "rernovals" ("additions") iden- ilies the tuples inserted into (deleted from) relation E. For the deleted base tuples in Rem tuples, the PF algorithrrl first computes an overestimate of the set of deleted derived tuples. Each poten- tially deleted tuple in this overestimate, that has not been inferred as deleted before, is rederived u~ing the new base relations. Those potentially deleted tuples that have alternative derivations are not actually deleted. In case a potentially deleted tuple I canllot be rederived, thell tuple r i~ indeed regalded a~ d~leted and ~lle deletion i~ propagated to deri~ed tupl~s that depend on ul-k w~ ~u~ Jr~ .N~il' K~ s 1111-'.)1-16616 d~ 1-90 1635X, all~l Allo L~AAL-()3-(: (\1,7 P.lh of l.cng~h y I'igure 1: Wllell DRed beal~ 1'1' 1. If J can be rederived using the new base relations, lhell lhe delelion of I i~ nol propaga~ The advantage of doing lhis every time new polentially deleled luple~ are derived is ~hal spllrio deletions are not propagated. We make a rew observations and describe a few oplimization~ lo Ihe Pl;' aJgorithlll w de~-ril>eove ob.~rvatio are a basis ror integratillg the P[; algorithlll an(l the DRed algorillllll de~ribed in [(;Mli9:~ lo ol)lai an algorithm that does better than either of the two origiual algorithm~. .Reference~ BRt~6] l~rancois Bancilhon and Raghu l~anlakrisllllall. An anla~eur'~ inlro~luclion lo recur~ive query proce~ing ~Irategie~. In Prueecd~ng~ I)J A(.'M .51(:MoD 1986 Inlernalional (.'on- /erence on Managemerll ol Dala, page~ 16--52, Wa~hillgton D. (~., may 19136. [Die~7] S. W. Dietrich. Exlellsioll lable~: Melllo relalion~ in logic programming. In Pn~e~dl~lgs or lhc Fourlh IEEE ~S~/~npo8lu~n on Loaic Proarammina, pa~e~ 264 '~72, 19#7. [~IM~93] Ashi~h (lupla, Inderpal ~ingh Munlick, and V. ~. ~iubrahmanian. Mailllaining ViewY Incremelllally. In To aypear in .SI(,MoD, 1993. HD92] .lohn V. Harri~on and !;uzanne Dielri~ll. Mainlenall~e or Materialize~l View~ in a Deduc- live Dalaba~e: An llpdale Propagalioll Approach. In Worlc~hop on l)edul t-~ Dalaba~ Jl(,'.SLP 199~, page~ 56{;5, 199'~. [lJ1189] J. D. lJllman. PrinclpIc~ orl)alaba~e a~ld hnouledge-f~a8e .';y~le~n~, volunle '~. ('onlpuler ~cience Press, New York, 19t~9. CO~ENT: appear~ in SIGHOD ~3. Full v~r~ion ~pp~r~ a~ ~T~T Labor~torie~, T~chnical R~port #921214-19-T~ Maintaining Views Incrementally ~ knded Ab~lrac~ Ashish Gupta- S~lford Uluversi~y al~uv~a~c~ s anlord edu lnderpal Singh Mumick ATkT Bell Labora or e~ mumid~Or~earch ~-- com A bstract We ~resenl illcremenlal ~valualion algorilhm~ o compule chaluges lo ma~nalized views ill rela~ional Id deduc ive dalabax syslems, un response lo chane~es (in-,er ion~, d~lelions, al~d updaLe~) lo lhe relaliolls Tbe vie~v dcfiuulions can be in SQL or Da alog, and may use U~I0~, l1egalion, aggreKalion (e.g. SUI~ ), linear r~cu~ion, alld gelleral re-ursioll. we fusl ~>rexn~ a countlng algorilhm Iha- ùracl~s Ihe number or allernalive derivalions (counts) ror each denve~l luple in a view Tbe a4sorilhm worl~s ~vilh bolh sel and duplicale semanlics We prexnl Ihe al6orilhm ror rlonreCUrJlUe Ul~rD~ (wilh ne~Sa ion and a~ree,alion), uld show lhal Ihe cour~l lor a luple can be compuled al lillle or llo cosl above Lhe cosL or derivin~ Ihe ~upJe The alKorilhm is oplimal in Ihal i- compule~ e~aclly Iho~ view ~uples ~llaL ar~ inser~ed or dele~ed Nde Ih~ we slore ollly Ibe nurnber or denva ions, no~ Ihe derivalions em~elves we Ihell presenl Ihe Delele and Rederive alRorilhm, DRed, ~or illcremenlal mainlenance of recursive views (ne- Kali(~l~ alld aKKreKalion are permil~ed) 'Ihe ale,orilhm works l~y lir~- dele~inK a superse- or Ille luples IhaL need 10 l~e u-ed dalaba~cs The oplimi~ion all(>w~ a Klol~al colls~rain~ i c a cons rainl sp~u~in~ nlulliple da aLases o bc venfied by acces~in6 dala a~ sil~Kl~ daLabase elimilla~ing ~he co~l ol acc~ssilU remde ùla~a lhe op~imiza~ioll is oa~ed on all algorhhm Ihal lakc~ a~ uL a 6101>al cons~rain~ anal col~rainl is slill salislied~ Ir ~h~ lo~al da~a doc~ llO~ sali~ry Ihe colldilioll LlKn a coll~ iol~l ~lol~ crili~aLion pro edute is re~uire l 1 Introduction ~lear Irend in illrormalion syslems lecllnology i8 lle di~lrioulioll or relaled dala across ~llulliple sileff Su~h sy61em~ may vary rrol~l lighlly-coupled parallel dalal)a6es lo rederaled inrormalion ~yslems In all a~ (~lle be~lefil or dala dislribulion is lhal each ~iile pro-e~ il6 own dala wilh some degree of au- ny AlIhesamelime however si~lcedalaacrosff lllay be relaled Ihere may be cerlain global con- ellcy re~luire~nenls or Inlegrily corl~rainl~ on Ihe di~lribuled dala l~llegrily co~lslrai~lls speciry Ihose ~ollfi8uraliolls Or Ihe dala Illal are considered seman- li~ally corre~ egrily co~lslrainls have been dis- ~u~ced hl ~nally ror~ and are well a~cepled as a userul e< hallislll ror Illolfiloring and ellrorcing dala seman- i( ~ rl~ w~ ùu~ rled l~y NSF ~ran~ lRI-sl-16646 and IPI- 9r~ 16358, ;~nd AR0 DAAl,o~C-0177. Jennifer Widom IElM Almad~n l~arch (~ll~er 65u Harry 1~1 San Jo~e (~ 9512~6099 ~ido ~ de a nu~ >ering Nchcme ror versions wilb alleroativeN Ihal has a usetul lexicograph- i al or lerillg 1 he ver~iol~ hierarchy is a lree By inspeclion Or lhe verNion numbers, we can ea~ily (lelerll~ioe wbeLher olle versioll is an ances or or anolber lf o, we can delerlnine lhe versioll ~equellce belweell Ihe~e lwo versiolls lf nol, we can delermioe Ihe mosl recenl CommOD all~eslor ~o ll~e~e Iwo Ver~ionN (i e, Ihe leaNl upper bound, lub) Sorlin~ Ihe version number l lexico~rapllically re2iulls ill a version beillg fiollowed by all d~scendanls and preceded by all ilN all eslor~. We u~e a repre~enla~ioll Or nonnegalive inlegers ~hal is selr delimiling and whose lexicograpllical ordering malcheN Ihe ordering by value 1 Introduction ()ur nloLivatioll is a projecl in collaboralive design for Civil Engineering being carried out al ~tanrord Tlle approach taken by the project, whose name is CEDB ((',ollaborative Environment ror the Desigll of Building~) leads to an interesting problem in version numbering ~ur approach ~o version nulllbering is the subject of this paper ('EDB u~; a version and configuration management system that supports incremental checking or constraints (~ee Krishnamurthy [1993]) There is a hierarchical version system, with a separate version hierarchy for each design discipline (e g, plumbing, electrical, structural engineering) A (<~nfigurali~n consists of a version from each discipline together with a set of constraints that the configuratioll should satisfy (although we do not insist on constraint satisfaction in all situa~ions) A configllratioll ~'l may have an ancestor configuration (~O, in which each of the versions participating in configuration ~ must be a descendant of the corresponding version in configuration (,O The ('EDB conlpu~ational strategy is to check incrementally for constraint violations by comparing a configuration with an ancestor configuration This comparison relies on ~he determination of ~hallges ("deltas") lhat occurred between two versions our approacll is to record tlle incremental cballges between a version and its immediate ancestor A~ a resul~, if we need to find the challges that have occurred between versions A and B, ù e must lind ~h~ challges associated with all versions that lie between A and B in the version hierarclly lnlagill~ llla~ all cllallges are stored in a relation, along wiLll the versioll to which they perLaill ror example, if in going from versioll r to a child version C,', we insert the elemen~ a, then we migllt slore Llle e e ù~ tem- ~q S~ erai qu~q hnO~4e- b~ ~ b~eD propo ed ~ 201, but mo t of th~m ~re ai o d~pendent on their a rD dbt~ rnoWk Th~ P~n~uin q t~m ol~ the e probieme Tb~ P.n~uiD q~t~m h~- b~eD de~eioped to ~ r muhlpk ~ppGe~tbD- tb~t b re their o rn object chem~ ~b~re d~t~ iD ~ commor~ ùcb- tbnal d~t~be e [~ 4, 6 11 ~61 W~ b~-~ de ehped quer~ maD4er for tb~ P~n~uin ~ tem W~ di cu-- ha r ~ po~ erful object qu~q hD~u~ an be Upw-ted on top of ~ rehtion-l dbt~b~ e W~ brie~l~ introduc~ t~ d~t~ modek of Pen~uin iD th~ n~t ectbn, aDd tben d~ribe th~ queq hn~u4~ iD Sec- tbo ~ Our quer~ proceJ in~ ai~oritbm i- de cribed in Section 4 Som~ furtber i~ue~ ar~ ai o di-m_ed in Sectbn ~ 2 Penguin ~yste m P~n~uin~ b~r ~ multi-h~ered conc~ptu-l cb~m~ Tb~ h~- ~-~ ùK th~ r~htbn l b~r, tbe ùie~r object h~r ~nd th~ C++ obj~ct h~r W~ iliu tr t~ thh h~rin~ u in, an e~mple of dht bo e containin,~ dbt~ on comp~, d~p rtm~rt~ and em- plo~ee~ in FiJ 1 The object~ h~ ~ the tructure of compo it~ object~ rith th~ ne t~d tupk~ a~ foilo~rin~ ~lampk- IFor inforn tion ~bout the P~nt~uin project pk~ rite toArthur M K~ller St~ntord Uni~er it~ Comput~r S~ienc~ D~pt St~nford CA ~305-2140 or ~o ~r~b ~-~terd ~- |CO~ENT: ~ubmitt~d ~or public~tion, D~c~b~r 1~3 Implementation of Object View Query on a Relational Databasel Tetsuya Takahashi Arthur M Keller Kobe Sleel, Ltd ~tAn~rd Univer~ity Ab~tract We present the implementation of the query function for the Penguin sy~tem Penguin is an object-oriented database system, ùrhich support~ multiple object views on a relational databa~e It enables many applica- tions to share a database using different ob~ect ~chemata Al-o, wer- can ta~e queries for the Penguin databa e in their application~, to retrieve ob- jects on the heterogeneous data model The query function is very po-rerful for manufacturing and engineering purpo~es, that is, connection- bet--een relstions and methods defined in applicationr by wers are a~railable in query requests We introduce the object model and the query e~ecution algorithm in Penguin system, and show the system configuratioD for the implementa- tion of the query function 1 Introduction In the field of msnufacturing and eng neering, it is required for database systems to handle complicated structures of data, such as de ign data and production data in the many processes of manufacturing In those applications, the data are clo-ely related to each other, and we ha~e to manage such relation hips But it is well l~nown fact that the relational data model is not nece~sarily ùuitable to represent such a comple~ schema And many new types of database systems, such as object-oriented database~, are tried to apply On the other hand, the relational model is so widely accepted that most new database applications are based on the relational model today, because of its simple description and mathematically elegant theory As the result, it is now almost impo~sible to replace all the e~istin~ database aPPlication- IFo~ inform~tion ùbout th~ Pen6uin p~Ojfft, ple~ e ùrrite to Arthur M Keller,St n~ord Uni~rer~it~, Computer Science Dept, St~nford, CA 9~305-21~10, or to cr~db ùtuntord ùtu CO~ENT: ~ubmitted ~or journal publication. || The ~igure~ ar~ ~ad~ using ~r~a~r and do not al~a~ print right (ir ~o, ~nd ùail to aguptaOc~.~tan~or~! ~U) _ Constraint Management on Distributed Design Configurations ~anjail'ivvari Ashish (;upta l)epar~ elll or Civil ~l~gineeriug Deparlmelll o~ù,olllpuler Science !~lalltotd lJIliversilySlanrord Ulliver~ily ~lallrord (-,A 5~4~0~4020 Slanrord, CA 9430~2140 ti~rariOci~ ~tan~ord duagupla~cs slanrord edu ~u~milled lo Journal of Env~ncenna with Comvul~rJ Ab~tract ~ r hile~lure, ~llgirleerhlg and Conhlruclion (AEC) de8i6ll processes involve Ille parlicipa- ioll or lllally de~igller~ who lnay be working independenlly in geo~raphically dislincl localions ~ lepelldelll de~igll~ evolve due lo challge~ lllade by lhe designers and discipline-specific CAI) appli-aliolls alld llleae challge~ lleed Lo be evaJualed and configured wllh re~pecl lo a sel or proje~l coll~lrailll~ ror overall design consi~lellcy lll addilion, each designer needs lo be ill- filrlllell aboul lhe relevalll changes l~lade by olher designers Mosl rework orders resull rron hlade~uale sharillg or llle evolving de8i8ll informalioll belween various parlicipanls ill lhe design prole~ I he awarelles~ or il~lporlanl changes Ihrougll aulo~laled collhlrainl lnanage~lenl cu avoid expell~ive reworks al Ihe laler slage~ of Illc projecls paper, we will presenl an illcremenlal approa~h lo conslraiul checlcing in AEC design collfiguralions A ~o~lfiguralion represenls a de~ign slale slored i~l mulliple dalabar~es eacl ~ollfiguralioll has a sel or dl~clpllne-speclfic dalabases and ul a~80cialed sel of Inler-dt~cipllrlar~ ~orl~lra~nls lhe conrilrainls are evalualed on lhe dalaba~es lo give Ihe Ihird componenl of collfiguraliolls a ~eL or tlolallons. lllcludillg violalio~l~ in Ihe configuralioll definilioll allows us lo supporl parlially consi~Lelll configuralions Parlially consislenl configuralions are imporlar rn~lll a praclical sla~ldpoilll sillce all llle design inror~lalion ~nay nol be available, or may be coll~islenl, al all slages or Ille Al;(~ de~ign proceas We will provide a ror~lal desc-riplio~l ror collfiguraLioll~ arld Ille senlallLics or challge~ oll configuralioll6 T his rrameworl~ racililales ill( relnelllal collslrailll cllecking and allows us lo supporl lhe nolions or persi~tenl and whal- IJ ~ollfiguralioll~ We al~o provide a cla~ificalioll or collslrain~ Ihal is llelprul in ~eleclive evalualioll or coll~lrailll9. 1 Introduction 11l a (ollal)oralive de~igll ell~ironnlellt, di~rerent participants need to sllare inrormatioll a~>out tlle critical a~pects or tlleir de~igll lata and tlle cllallges tllat occur in tlleir de~ign Tlle cllallge~ re~ult rr ~ line tUllillg or desigll and ree~aluatioll or tlle d~ign goals a~ tlle de~ign progres~ Tllererore il is ll~e~ar!~ to 1~ le to llalldle tlle e~olutio n Or dis(ipline-specilic esiglls and ~e al~le t~ ull~ti(.~ ol)~ uilll (llan~i thlollgllout th~ ~;igll proce~ Eacll AE(~ desigll c~llligur;Lti~n Ri~al rule~ u lleRa~ UI~R.~ ill rule~ X~re~ all a~l~ ale rallKe ~f (~llerie~ Adllerell~e lo ~lassi~al ~ledllcli~e r~rrly oller~ Ille ill~ ely ~a~rre~ eu~illR of Ihe rldr~ Ihu~, a ~ariel! of a~ roa he~ lo defi~illR Ihe rl~ '' IllralfillR ~f sulh rule~ h~r leell developed 1 llli ~Jal~rr we ~ur~r! Ihe pril~ipaJ apl)r-)a~hes, illl ludin~ r.~lilir~l lle~ ell-f~ullde l l~eRalio~ al~le~ odel ~rlllallli~, alld Ill~dularl~ ~Iralified ~elllu~ 1. Motivation I)alal~a~e ~uer! lall~uaRe~ l~a~ed ol~ loRic il~ ~ollle for arr a l~rull~ Irelld hl Ihe de~elop~l-enl of ll~oderll lalal a~r ~!~lrlll~ Uuildillg oll SQL, a languaKe whose fi al llalure i~ hidden l~y Ihe ~y~lax, loKical 4uery la~l~uaRe~ l~re~erve Llle de-lara~ e '~ay whal you wanl, ~ 1 IU,W lo Rel il" llalure of ~SQL, while exlel~di~K Ihe exl~re~i~elle~ Or Ihe lallgua~e l~eyond whaI ~QL pro- i.le~ ~iee l~al~akri~h~arl al~d lJll~arl 1199:1] rOr a sur- ~,r la~lguage~ a~ld sv~lell~ a~ed oll Ille logical or "dedllclive" approacll. !~illl~>le fi~ ot logic, ~uch as llorll-clau~e rule~ (ir- lllrll rule~) have a ulli Jue nalural Illeallillg, whicll is Ihelrail fix~filll or Illhlilllal Illodel ~f Llle rule~ Ilowever, !' illlere~llllR 4uerie~ re~uire u~ Io u~e Illore general -rlll, ol rule~, ~uch a~ Ih~e ill wbicll olle or Illore ~uo- .)al~ rar llegali ely 11 i~ oll Illr~e exlell~ioll~ Llla ur r~ u~ I he l~r >l~lelll willl logic IllaI ill ~1 e~ lleRalive suL~ llal Lllere i~ rarely a Ul~i ~ue ll~ilfilllal Illodel 1 ellli ~llal logi(-, Ihe rulr~ ~lld L~e ~8hl lo "Illea~l" uhillr~rr ~ all l~gi~all! Le illr~rrrd r~ Ihffe rul~ arld ,llfillg Ill ~re I haI i~ Ille ~allle a~ ~ayillg Ille Illear~- il'R i~ Ihe i~l~r~be~ r all Ihe ll~ilfilllal Illo lel~ llow- e rr Ih~ rr~ ~i )ll i~ rarely hal Ihe l~r ~grallllller ili rl! rxl,r 1, Ihr rlde 1-- llleall lhu~ rr~ear h I. 1~1-1; - ùh~ldl~a~e~ illl llegali Ul all l~r ~eell a~ ~.-r~ ~u~ .,rlr.J 1~ ~'il 6ra~ 9~ 21)10.~ and ,tlJl) a series or propo~al~ Io defille on parIi~ular Illillill~al odel as Ille IlleallillK of rule~ willl neKalioll Io ~IIIe legree, Ihi~ l~r>Rrall~ or res arch ha~ l-eell ~uc~e~rul, alld Ihere r irllporlanl cla~es or logical rul ~ will ~eRalion rOr which Ihere i~ a widely u-cepIed 1~ ar il~g 11l olher ù ase~, Ihe proper ~hoil-e ot ll~eanil~g i~ le~ clear I he ~uojecl ~>r llegalion ill dedu- live llalal~a~ i~ relaIed Io Ille ll~aller of llolln~ol~olol~ic rea~ l~ R a~ di~ cu~ed i~l Lhe lileraIure or ArIificial IlllelliRellce I here i~ a ellden~y iu Ihe dalaL~alle con~ ul~il! Io value ~ore Ihe i~ue~ ~r efr~ fie~cy, i e, hou rasl e ~a~l a-l~wer ùlUeriff u~ s a givel~ definilion Or Ihe -orre-I llfillilllal ll~odel I he Al a~l~)roa lle~, on Ille olller hal-d le~ld Io rOcus oll Ihe correcL~IeYs of Ihe l~odel, regardle~s or ils ~raclal~ y Ihe difrerence will l~ eln~Jlla~ized wllen we colll~are well-roullded alld slal~le-ll~odel a~ roachff l~elow 2. Introduction Logical rules are usually ex~res~ed il~ Ihe ror llead ~ Uody where Ihe head i~ all alolllic f ~rlllula (~redicale ilh argunlel~) alld Ille oody is Ille logical AN L~ or lileral~, i e, alolllic forll~ulas or Iheir ~ega~iol~ ~;ull lileral i~ called a ~u~oal A collecIiol~ or rule~ is otIel~ ùalled a 1~- - yn)yran~, or jusl yr ~gnam re lica~es are divided inlo Iwo classes ~I)U (exlel~siol~al dalal~ase) ~redi alff are sl)red as relaIiolls ill Ille Jalal~ase Ir y is all t:ou llred- ica~e, lllell Illere Kill t~e a ù~rres~)olldhlR relaIioll, say 1~ Ihe dalal~ase and y(al, ,a~,) is Irue ir alld only ir Illere i~ a tu~le (al, a~ ! ill relalio I lle li:L~U lul~les are ~ollle~ es called ùlata 11)13 (il~lel~siol~al dalallase) l~redicaLes, whi-ll are I fi~led l y Llle rules olll II)U ~)reli a-es a~l a~ ~ear ill Ill heal )f a rule Ih~lll II)U a~ld t:l)U yr~ licales all a~ ear ill Ille l~o 1 Exall~ llere are rules rOr ~ ulillg Ille Irallsili I ~sllr~ )r a gra~)ll I hal i~ ar is all l:l)U ~r di- ;-le all l Ir ( l b) Ill all~ Illal Iller~ i~ all ar rr ~1ll 1l ~1.- ||CQ~ENT: Sub~itt~d 12/~3 ror journal publication ~ | (cut-~nd-pa~t~ ur~ ing rro~ p~ The Starburst Active Da~abase Rule System Jenni~er Widom~ Deparlmenl or Computer Science Stanford Uni~rer~ Slan~ord, CA 9430~2140 widomOc~ ~lanford edu Ab~tr~et This paper describe our developmenl of the Starbur~t Rule S~tem, an acti-e dat~baJe ~ule~ facilil~ inlegraled inlo Ihe Slarbur l e~len~ible databa e ~tem at the IBM Almaden Re earch Cenlel The Slarbur~l rule language i~ b~ed on arbilrar~r d~t~ba e ~t~te tran itioo~ rather than tuple- or slalement-te~lel change~, ~rielding a clear arld fle~ible e~ecution emanticJ The rule ~y~lem was implemenled rapidl~ u~ing the e~ten~ibility featur~ of Starbunl nd i~ inlegrated inlo all a~pecl- of dalabaJe proce 5in8 Inde~ term~: actlve data~a~e J~tC~ datahuc plvd~ ion ~k~, ~z~cn~i~lc ll~t~ hJC ~ cmJ~ czper~ da~aba c Jy~tCrlu 1 Introduction Active database systems allow users to create mlc~rules specify data manipulation operations to be e~ecuted automatically whenever certain e~rents occur or condition~ are met. Active database rules provide a general and powerful mechanism for traditional database features such as integrity constraint enforcement, view maintenance, and authorization chec~ing; acti~re database rules also support non-traditional databa e features such as version management and worl~flow control. Be- cause active database rules are similar to the forward-chaining prot~ction r~les used by Artificial Intelligence applications, active database systems also provide a convenient and efficient platform for large and efficient }nowledge bases and espert systems. In this paper we describe our development of the Star~urJt Rule SYJtCm, an active database e~tension to the Starburst prototype e~tensible relational database system at the IBM Almaden Research Center. We cover both our design of the rule lanKuage and our implementation of rule processing as an estension to Starburst. The Starburst rule language diffen from most other active database rule languages in that it is based on arbitrary database state transitions rather than tuple- or statement-level changes, permitting an e~ecution semantics that is both cleanly-defined and fle~ible. The implementation of the Starburst Rule Sy-tem was completed rapidly and relies heavily on the estensibility features of Starburst. The Starburst rule processor differs from most other active database rule systems in that it is fully functional, and it is completely integrated into all aspects of database processing, including query and transaction processing, concurrency control, r~llback recovery, error handling, and authorization. This wori~ ~v~ perrorm~d wMI~ th~ ~uthor Iva~ ùt th~ IBM Alm~d~n ~e~arch c~nter, s~n ùo~ CA w ~|COtt~lENT: to bo publ~shed by Spring~r-Vorlag | The Electronic Library of the Future: Accessing Worldwide Information Tak W Yan and Hector Garcia-Molina Department of Computer Scienee, Staoford Univenity, Stanford, CA ~305 {t~an, h-ctor}Qc- ùtan~ord ù~h ù h~tr~t Computer~ and networl~ are rapidly ma~ing a~railable va t amount- of electronic information At the ame time, traditional ource- of information and entertainment such a~ boo~, paper~, movie~, and mu~ic are being pro~ided di6itally Thu-, people ~ill have at their fingertip~, either at home or at the omce, a ma i-e Uelectronic librarr" that will fundamentally change ho--tbey acce# nd proce~ information In thi~ paper ùre di cur~ ome of the emerging i- ue~ in electronic librarie-, ~uch a~ heterogeneou~ acce~, information fu~ion, information finding, #lecti-e di- emination of information, and intellectual property right~ In addition, ùre pre#nt a brief o-er~ie~ of a ne-v Elfftronic Library project at Stanford and four other uni~rer~itie- (Ber~eky, CMU, Cornell, and MIT), ~pon ored by ARPA and CNRl 1 Introduction Computers and networl~s are rapidly malting a~railable vast amounts of electronic infor- mation At the same time, traditional sources of information and entertainment such as books, papers, movies, and music are beinB provided digitally Thus, people will have at their fingertips, either at home or at the offlce, a massive "electronic library" that will fundamentally change how they access and process information Already today, computers and networks are increasingly used for retrieving information on a wide variety of ways Conventional library systems, such as FOLIO at Stanford and CO~ENT: to appear in TODS. Index Structures for Information Filtering Under the Vector Space Model Tak W. Yan and Hector Garcia-Molina Department of Computer Science, Stanford Uni~ersity, Stanford, CA 94305 November 8, 1993 Ab~tr~et With th~ ~v~r incrcn in6 volume~ of inform~tion 6ener-tion, u er~ of info~m~tion ~km~ re t~cin~ ~n inform~tion o-erlo d It i- de ir-ble to ~upport inform~tion filtcrin6 n~ ~ compl~ment to Ir-dition~l retrie-al mecha~ m The number of u~er~, nd thu~ profile~ (repre entin~ u er~' lon6-term intere~t~), h ndled br ùn inform~tion filterin6 ~tem i- potentiall~ bu6e, ~nd tbe ~tem b~ to procec ~ con-t Dt ~tr~ m of incomin6 iDform-tion in ~ timd~ f~hion The ~tficienc~ of the tilterin6 procec~ i- tbu~ n import~nt iAUe In thi~ p~per, ~e ~tud~ ~b-t d-t~ ~trocture~ nd al6orithm~ c n be u ed to efficientl~ perform l~r6~ nl~ inform~tion filtering under Ibe ùector ~p ce modd, ù retriev l model e-t-bli-bed ~ bein6 effecti~e We ~pplr tbe ide~ of the ~t nd~rd iD-erted inde~ to inde~ u-er profile~ We d~vi~e ~n ~Itern-tive to the ~t~nd rd in-erted index, in ~hich ~e, in~te d of inde~in6 e-er~ term in ~ profile, elect onl~ the i6nific~nt one~ to inde~ We e~lu~te their perform~nce ~nd ~ho~ th-t the inde~in6 method~ require order~ of m-6nitude fe~rer l/O~ to procer ~ document tb~ ~hen no inde~ u ed We ~bo ~ho~ th-t tbe propo ed altern~ti-e per~orm~ better in lerm~ of 1/0 ~nd CPU p-oce in6 time in m~nr ca~e~ Intr~dll~ti~n Intormation i~ increasingty available in electronic form Tbe number and ~i e of full te~t document database~ are rapidly increa ing User~ of ~ucb databa e t~y~tem~ are facing an information over- Thi~ re~e~rch ~.e ~pon or~d b~ ~h~ Ad~nc~d Re e rch Projec-- A~ency (ARPA) of th~ D~p~rtm~nl of Def~n e under Cr nt No.MD~072-~12-J-102~ ~ith th~ corpor~tion for N~tionnl R~e rch Initi~ti~e~ (CNRI) Th~ nd conclu~ion~ con~in~d in thi~ docum~nt r~ tho e of th~ ~uthor~ nd ~houJd not b~ int~rpr~t~d ~ n~ce-- ril~ repr~- ~entin~ ~he offici~ policie~ o, endor em~nt ~ith~r ~pre~ ed or impli~d, of ARPA th~ U s Cov~rnm~nt or CNRI ~ ~ ll CO~ENT: Long v~rsion is STAN-CS-93-1494. Short ~er~ion . to be publish~d in ICDE '~4. . _ Index Structures for Selective Dissemination of Information Under the Boolean Model ~ Talt W. Yan and Hcctor Garcia-Molina Department of Computer Science, Stanford Univer~ity, Stanford, CA 94305 December 15, 1993 ~b-~r~ct Th~ number, ~ , nd u er popul tioD of bibliol~r~phic ~d full t~ct document d-t-b~ ùr~ r~pidl~ ~ro~vine, With ~ hi~h docum~Dt rri-nl r~k, it become~ e ~nti~l for u er- of ~uch d~t~b~e~ to h~v~ cce~ Io th~ ùer~ I-te t docum~nt~; ~et the hi6h document rri--l r~t~ bo m~ it difficult for th~ u er~ ùo ~eep th~m el~ upd~ted It i~ de ir~ble to ùllo-- u er~ to ~ub crib~ profile~ , qu~rie~ th-- r~ con~t ntl~ e-nlu-ted, ~o th~t th~ ill be ùutom--ic 11~ in~orm~d o~ n~ dditiou~ th-t m-~ be of intere~t Such er~ice i~ Ir dition-ll~ called Selecti-e Dirc~min-tion of Inform~tion (SDI) Th~ hi~h docum~nt ~rrirul r~t~, the hu~e number of u~er~, ~nd th~ timdin~rc requirement of th~ ~rvic~ po e ~ ch~ n~ in chienn6 efficient SDI Iu thi~ p p~r, ~ propo e e er-l ind~ ~tructure~ for ind~ain6 profile- ~nd al~orithm~ th~t ~ffici~nll~ m~tch docum~nt- 4-in~l l~r~ numb~r of profile~ W~ ùbo pre ent ùn~ nd ~imul~tion~ r~ult~ to comp re their perform-nce under differ~nt ~c~n~no~ 1 Introduction With tbc impro~ing co t effecti~rene~ of tlecondarr titorage and the e~panding ~rolume ofdi8itied te~tual data, the number and ùi~e of bibliographic and ~ull te~t documenl databat~ are rapidly 'Thi~ Krc rch -~ ~pon ored br ~he Ad- nced R~-c rch Proj~ct~ Al~enc~ (A~PA) of ~h~ D~p rtment of Def~nK under cr n~ No MDA~72 92-~-102~ h ~h~ corp~r~tion for N~tion~l Re~rch Ini~ e (CNRI) Th~ nd conciu~ion- cont~in~d in ~hi~ docum~n- r~ tho e of Ih~ ~uthor~ nd hould not b~ int~rpret~d ù- n~ce- ril~ KpK- ~en~in~ the olqici~l policie- or ~ndor-ement, either e~pre- ed or implied, of ARPA, the U s Go-ernment or CNRI