Thu Aug 14 10:08:31 PDT 1997 Construction of the logical query plans ======================================= This document is valid as of 8/14/97. The Logical Query Plan Manager ------------------------------ This document describes how query plans without indexes are created. In the following descriptions, we don't consider any query plan with indexes at all. The logical query plan manager gets a canonicalized query and translates it into a logical query plan. It uses a top down approach to construct a logical query plan; It creates a query plan while traversing a canonicalized query from top to bottom. A canonicalized query plan consists of nodes which are subclasses of the Node class. As a base class, the Node class has a 'type' variable to specify the actual class. Therefore, the logical query plan manager traverses a plan consisting of Node class objects, check the type variable, and then, do something appropriate for derived class nodes. Thus, it is convenient to have a function which is called recursively while traversing a plan and calls functions suited for a node currently processed: 'MakeLQP' is the function. Before explaining about the MakeLQP function itself, we describe the general view about how a query plan is created. Let's think about the following simple query. select c from A.B b, b.C c where b.D = 10; Generally, a select-from-where query(SFW query) is constructed as follows; construct a plan for a From clause, put a plan for a Where clause on top of the plan for a From clause, and finally, put a plan for a Select clause on top of them. The logical query plan for this query look like: +-Sel---Aggr---Sel---Sc D | exist D=10 | P-Cs-J-+ +-Sc C | | +-J-+ +--Sc B | | +-J-+ | +--Sc A Notice that where clause is actually translated into the following existential predicate. ${}^\exists$b.D{d} in A.B{b}: d = 10 Any queries result in a bag (or a set if DISTINCT is specified) of objects. The CreateSet node located at the second to the top creates the actual result. In this example, there's no nodes corresponding to the Select clause since C is already picked up in the From clause. Next example query is: select b.C from A.B b; The logical query plan for this query look like: +-P--Cs--Sc C | | P-Cs-J-+ +--Sc B | | +-J-+ | +--Sc A In this example, the query means creating a set of C's for each B. Therefore, we need to create a set of C's for each B, instead of creating a set of C's for all of B's. The plan of P-Cs-Sc corresponds to the Select clause and the reason why we use the CreateSet node here is already explained. But somehow, we have to have a way to tell the node that the created set is labeled by B. This mechanism is described later in the description of LQPCreateSet. The last example is select A.B.C, A.B;. Here we specify two path expressions in the Select clause. First, this query is translated into the following query. select b.C, b from A.B b; The logical query plan is as follows. +--Cs | +--Set---+ | UNION | | +--P--Cs--Sc C P-Cs-J-+ | +-Sc B | | +-J-+ | +-Sc A Two important structures can be seen in this logical plan. First, if there are more than one items in a Select clause, the logical query manager creates a Select plan for each item and them unite them using UNION operations. Due to the structure of the OEM, children objects linked to a parent object are easily distinguished by their labels, and thus we can combine the results by UNION without problems. Another point is that the system creates just a CreateSet node for a selection of 'b', and the CreateSet will create a set containing only one 'b' object scanned by Scan node labeled with 'B'. This means CreateSet nodes may not have a child plan and may need some special treatment for that case. There is no strong reason why the selection plan for 'b' doesn't have a Project node. It could have it, but not needed. MakeLQP function ---------------- The MakeLQP has five arguments. The first argument is a pointer to a node currently processed, and the second one is a reference of a pointer to a partial logical query plan created by processing this and underneath nodes. The third one is a pointer to a void, but this pointer is actually a pointer to either a Prefix object or a PrefixSet object. As of February, 1997, a construction of a query plan for SelectListNode is the only case where a pointer to a PrefixSet object is passed as an argument, because we may get many results from an underneath plan. The fourth argument is a prefix to specify what should be the label for the created result of a bags. If the underneath plan will not generate bags of objects, this prefix is not used. The INVALID_PREFIX is passed as a default value when this prefix is not specified, and the result bag is labeled using the same label of the objects contained the bag. The fifth argument is a boolean flag and if this flag is set, the underneath plan will not put a CreateSet node on the top of the plan. This flag is currently used in two cases. The first case is aggregations in value queries. Since aggregation nodes may call underneath plan exhaustively to calculate aggregated values, we don't need to create a bag for the aggregated plan, therefore we don't need to have a CreateSet node on top of the plan. The second case is predicates with existential quantifiers. Once we find some objects satisfying the predicate, we need to shortcircuit the GetNext call to the underneath plan. Thus making a bag of objects by using a CreateSet is useless. Descriptions about constructors and internal variables of each LQPNode class ---------------------------------------------------------------------------- N.B. I didn't describe internal variables which only hold values passed as arguments. If a node assume that a result is an object, the node will manipulate itself as an actual result. If a node assume that it is a set, this means the node will open a complex iterator over the returned object, handle children objects as actual results. LQPProject ---------- This node assumes that a result returned by the underneath node is an object, not a set of object. LQPProject(const PrefixSet &psSchema, LQPNode *plnChild, int flag = 0); psSchema: Prefixes of OA slots which this node will project out. They are the target prefixes of the underneath plan. plnChild: a pointer to the top node of the underneath plan flag: If this flag is set to 1, a physical project plan corresponding to this node would not generated. psDone: prefixes of OA slots that underneath nodes uses but are not projected out. This is passed to a corresponding physical operator and used to free memories allocated in these OA slots. LQPSelect --------- This node assumes that a result returned by the underneath node is an object, not a set of object. LQPSelect(LQPNode *plnChild, Node *pPred); plnChild: A pointer to the top node of the underneath plan. pPred: A pointer to a node with information of predicates. The actual type of the node is PredicateNode. LQPAggr ------- This node assumes that a result returned by the underneath node is an object, not a set of object. LQPAggr(LQPNode *plnChild, const Prefix &pfxAggr, enum AggrOperator aggrOp, Prefix pfxTarget); plnChild: A pointer to the top node of the underneath plan pfxAggr: Objects in the OA slot specified by this prefix will be aggregated. aggrOp: An aggregation operator. pfxTarget: A prefix where the result object of this aggregation will be placed. LQPSet ------ This node assumes that a result returned by the underneath node is a set of object, not an object. In the DISTINCT case, we don't have a right child. For UNION, INTERSECT, EXCEPT LQPSet(LQPNode *plnLeft, enum SetOperator setOp, LQPNode *plnRight, Prefix pfxTarget, Prefix pfxFrom = INVALID_PREFIX); For DISTINCT LQPSet(LQPNode *plnLeft, enum SetOperator setOp, Prefix pfxTarget, Prefix pfxFrom = INVALID_PREFIX); plnLeft,plnRight: a pointer to the top node of the left and right child's plan setOp: A set operator pfxTarget: A prefix where the result object of this aggregation will be placed pfxFrom: If this prefix is passed, the label of the result will the same as the label of the OA slot specified this prefix. Otherwise, the result's label will be a default label. Example: select r.Entree from Frodos.Restaurant r; leftSrc, rightSrc: Prefixes of OA slots where the result from left and right child will be placed. LQPArith ---------- This node assumes that a result returned by the underneath node is a set of object, not an object. LQPArith(LQPNode *plnLeft, CompoundOperator cOp, LQPNode *plnRight, Prefix pfxTarget); plnLeft,plnRight: a pointer to the top node of the left and right child's plan cOp: An arithmetic operator pfxTarget: A prefix where the result object of this arithmetic operation will be placed leftSrc, rightSrc: Prefixes of OA slots where the result from left and right child will be placed. LQPCreateSet ------------ This node assumes that a result returned by the underneath node is an object, not a set of object. LQPCreateSet(LQPNode *plnPlan, PrefixSet srcPfxSet, Prefix tgtPfx, Prefix fromPfx); LQPCreateSet(LQPNode *plnPlan, Prefix srcPfx, Prefix tgtPfx, Prefix fromPfx); LQPCreateSet(LQPNode *plnPlan, Prefix srcPfx, Prefix tgtPfx, Boolean bFromPfx, PfxOrLbl uFrom); The last constructor has not been used but will be needed if we support AS specifier, because in the case, the specifier may have not been registered in a label manager and may have just assigned a label, meaning we can't find the label in any OA slots. Therefore we have to specify the label explicitly. This may be used for OEM cases, too. N.B. The label manager may degrade the performance if we ask hundreds of queries with different AS specifiers not in the database. Since it has to store all of labels in one complex value, these queries make the value huge, thus making searches for labels used in the database slow. I think the scope of the AS specifiers is just in the query, we'd better have a mechanism to assign volatile labels. plnPlan: A pointer to the top node of the child's plan. This could be NULL. If this is not NULL, a physical operator will call Open, GetNext, Close functions for the underneath node, but if NULL, it creates a set containing an element/elements copied from the OA slot/slots specified srcPfx/srcPfxSet. cOp: An arithmetic operator srcPfxSet, srcPfx: A prefix or a set of prefixes from which this node get objects to create a set. tgtPfx: A prefix where the result object of this arithmetic operation will be placed fromPfx: If this prefix is passed, the label of the result will the same as the label of the OA slot specified this prefix. Otherwise, the result's label will be a default label. Example: select r.Entree from Frodos.Restaurant r; bFromPfx: Boolean constant. True if the uFrom contains a prefix and FALSE if it contains label. uFrom: A union value, either a prefix or a label. This could be a short int variable since both of them are short int type right now, but we may want to change the types. srcPfxSet: Contains the source prefix(es). Even the second or third constructors are used, they are transformed into a prefix set. bDistFlag: A flag to specify if this create set node should create a set without duplication. NOT USED YET. Set operator above this create set node does the same thing now, but we'd better do here. /* The following nodes will be described by Jason */ LQPScan ---------- LQPScan(SpeNode *pSpe); pSpe: LQPJoin ---------- LQPJoin(LQPNode *plnLeft, LQPNode *plnRight, Node *pPred = NULL); LQPNode *plnLeft; LQPNode *plnRight; Node *pPred; About the greatest common path expression ----------------------------------------- If a query doesn't have a From clause, the system generates it for the query by finding the greatest common path expressions among items in a Select clause. For example, a query, select A.B.C, A.B.D;, will be translated into select b.C, b.D from A.B b;. This is also true for value queries, queries containing arithmetic operations. Thus, select ((A.B.C)+(A.B.D)); will be translated into select ((b.C)+(b.D)) from A.B b; # # How about select A.B.F, ((A.B.C.D)+(A.B.C.E));? # Do we make A.B as the greatest common path expression for the entire query? # Or the entire query doesn't have any GCPE but the arithmetic subquery # has A.B.C as GCPE? # For set operations, the system doesn't generate any the greatest common path expressions. For example, a query, (A.B.C) union (A.B.D);, will create a set of C's for all of A.B's and a set of D's for all of A.B's, and then, take a union of the two sets. In other words, it won't handled like select ((b.C) union (b.D) from A.B b;.