RankSQL: Query Algebra and Optimization for Relational Top-k

RankSQL: Query Algebra and Optimization for Relational Top-k

RankSQL: Query Algebra and Optimization for Relational Top-k Queries AUTHORS: Chengkai Li Kevin Chen-Chuan Chang Ihab F. Ilyas Sumin Song Presenter: Roman Yarovoy October 3, 2007 1 Before RankSQL Ranking (top-k) queries: Query result is sorted by rank and limited to top k results. Support for ranking was lacking from RDBMS. Previously, isolated cases of top-k query processing were studied. No way to integrate top-k operations with other relational operations. 2 Previous (traditional) approach

Query processing without ranking support: 1. Evaluate select-project-join (SPJ) query and materialize the result. 2. Sort the result according to a given ranking function. 3. Take only top k tuples. Associated problems: No interest in total order of all the results. Evaluating ranking function(s) can be expensive. 3 Key contribution Li et al. proposed: Extending relational algebra to support ranking as a first-class database construct. Consequence: Rank-aware relational query engine Rank-aware query optimization. 4

Top-k query: Example 1 R T TID a1 a2 p1 p2 p3 TID b1 b2 p4 p5 r1

27 50 0.5 0.3 0.25 t1 47 55 0.5 0.1 r2 17 60 0.6

0.5 0.65 t2 66 65 0.5 0.7 r3 47 90 0.7 0.7 0.95 t3 27

15 0.5 0.8 r4 87 10 0.8 0.1 0.35 t4 99 95 0.5 0.2

r5 47 70 0.9 0.1 0.75 5 Example 1 (contd) SELECT * FROM R r, T t WHERE r.a1=t.b1 AND r.a2>t.b2 ORDER-BY p1+p2+p3+p4+p5 LIMIT 2 TID a1 a2 b1 b2

F r3/t1 47 90 47 55 2.95 r1/t3 27 50 27 15 2.35 r5/t1

47 70 47 55 2.35 (where F = p1+p2+p3+p4+p5) 6 Rank-relational algebra There was no way to express such query in relational algebra. Extend relational algebra by adding rank as a first-class operation. Based on the observations of first-class constructs (eg. selection), two requirements are needed to support ranking: 1. 2.

Splitting Predicate-by-predicate rank evaluation. Interleaving Swapping rank operator with other operators (i.e. ranking is not only applied after filtering). 7 Ranking Principle Def: Given a ranking function F and a set of evaluated predicates P={p1, p2, , pn}, maximal-possible score of a tuple t is defined as: Ranking Principle: If FP[t1] > FP[t2], then t1 must be ranked before t2. 8 Rank-Relation Def: For monotonic scoring function F(p1, , pn) and a subset P of {p1, , pn}, a relation R augmented with ranking induced by P is called a rank-relation, denoted by RP. Implicit attribute of RP is the score of tuple t, that is FP[t]. Order relationship of RP : For all t1, t2 RP : t1 < RP t2 FP[t1] < FP[t2] 9 Operators of rank-relations Rank (or ) operator adds a predicate p to set P. i.e.

p(RP) R P U{p}. Example 2: p1(R{p2}) R{p1, p2}, where F=(p1, p2, p3). TID a1 a2 p1 p2 p3 r3 47 90 0.7 0.7

0.95 2.4 r2 17 60 0.6 0.5 0.65 2.1 r5 47 70 0.9 0.1 0.75

2.0 r4 87 10 0.8 0.1 0.35 1.9 r1 27 50 0.5 0.3 0.25

1.8 F{p1, p2} 10 Extended operators 11 Example 3: Extended Join a1,a2,b2(c (R{p1, p2 p3} JOIN T{p4, p5})) SELECT r.a1, r.a2, t.b1 FROM R r, T t WHERE c ORDER-BY F LIMIT 2 TID a1 a2 b2 F {p1, p2, p3, p4, p5}

r2/t4 17 60 99 2.45 r4/t4 87 10 99 1.95 r1/t4 27 50 99 1.75 (F = P and c = r.a1+r.a2 < t.b1) 12 Extended operators (contd) Note: Cartesian product is defined similarly to join, but not discussed in the paper. Projection operator has not changed. Computation is based on both Boolean and ranking logical properties. Perform Boolean operations and maintain the order induced by all given ranking predicates. 13 Equivalence relations

In the extended rank-relational model, ranking is a firstclass construct. Can derive algebraic equivalences from the definitions of operators (Proofs are omitted). Example 4: 1. c(RP) (cR)P 2. RP1 TP2 (R T)P1 U P2 Thus, we can interleave the rank operator with other operators (i.e. push down across operators). 14 Equivalence relations (contd) 15 Equivalence relations (contd) Note: Proposition 1 states that ranking can be done in stages (i.e. one predicate at the time). By Propositions 2, 3, and 4, the relations hold commutative and associative laws. By Propositions 4 and 5, can be swapped with other operators.

16 Incremental execution Blocking operators (eg. sort) lead to materialization of intermediate results. Goal: To avoid materialization and implement a pipelining execution strategy. We want to split rank computation into stages and to reduce the number of tuples considered in the upcoming stages. We can output (i.e. advance to the next stage) a tuple t, whenever t has a score which is greater or equal to the score of any future tuple t . 17 Incremental execution (contd) Apply p to RP and maintain priority queue ordered by P U{p}. Let X = set of tuples from preceding stage. Draw t from X. If FP U{p}[t] FP[t] and future t drawn from x, FP[t] FP[t] for any

then FP U{p}[t] FP U{p}[t] and t can be output (proceed to next stage). 18 Example 5: Top 2 of W Given F = AVG(p6, p7, p8) idxScanp6(W) TID x p6 p7 p8 F p7 p8 TID x F TID x F

w1 3 0.9 0.8 0.2 29/30 w1 3 9/10 w1 3 19/30 w2 7 0.8

0.7 0.1 14/15 w2 7 5/6 w4 1 3/5 5 0.7 0.6 0.1 5

w2 7 8/15 w3 w3 23/30 9/10 w4 1 19/30 w3 5 7/15 w4 1

0.5 0.4 0.9 5/6 19 Different evaluation plans There exist algorithms to implement rank-aware operators as well as incremental evaluation. Efficiency of query evaluation will now depend not only on the regular operators, but also on the rankaware operators. Due to algebraic equivalence laws, we can define additional evaluation plans. Hence, we want a query optimizer to take additional execution plans into consideration. 20 Rank-aware optimizer Extended algebra Extended search space. Impact on enumeration algorithm: Li et al. designed a 2-dimension enumeration algorithm: Dimension 1 = Join size, Dimension 2 = Ranking predicates. The algorithm is exponential in both dimensions.

Heuristics applied to reduce search space. Impact on cost model: For ranking queries, it is more difficult to estimate the query cardinality of the intermediate results, whose accuracy is the core of the cost model. Authors proposed to estimate cardinality by randomly sampling tuples. 21 Critique Erroneous examples. No example of tie-breaking function. Bad explanation of incremental evaluation. 22 Future research directions Cardinality estimation: New/improved techniques for random sampling over joins. Dynamically determined/chosen k. Exploring physical properties of rank-aware execution plans. 23

Recently Viewed Presentations

  • Recíprocas

    Recíprocas

    Recíprocas Son un tipo especial de reflexivas que se construyen con sujeto animado múltiple o plural. La acción es intercambiada por cada uno de los componentes del sujeto
  • 8.2 Cell Growth and Reproduction

    8.2 Cell Growth and Reproduction

    Mitosis vs. Meiosis. Makes EXACT copies. No new genetic diversity. 2n 2n (diploid diploid) Happens in body (somatic) cells. One cell division. Asexual. Makes UNIQUE gametes. Increases genetic diversity. 2n n (diploid haploid) Happens in gamete producing cells (GONADS: testes...
  • Pronoun Notes - Ms. Hannawi&#x27;s Classroom

    Pronoun Notes - Ms. Hannawi's Classroom

    Tahoma Arial Lucida Sans Unicode Wingdings 3 Verdana Wingdings 2 Wingdings Concourse 1_Concourse 2_Concourse 3_Concourse 4_Concourse 5_Concourse 6_Concourse 7_Concourse Pronoun Notes Basics Example Personal Pronouns Possessive Pronouns Possessive Pronouns as Adjectives Interrogative Pronouns Demonstrative Pronouns Indefinite Pronouns
  • SuperSand History The first pilot trial on Tertiary

    SuperSand History The first pilot trial on Tertiary

    History. The first pilot trial on Tertiary filtration of municipal waste water was carried out in August 1978 by Dr. Hans F Larson for Axel Johnson Engineering which became Nordic Water.
  • Welcome to Mr. Sarabia&#x27;s Grade 4/5 Class

    Welcome to Mr. Sarabia's Grade 4/5 Class

    Welcome to Mr. Sarabia's Grade 4/5 Class St. Leo's Catholic School 2015-2016 What will the focus of the class be? We will be adhering to the Ontario curriculum for both grades and all subjects.
  • The Dorsal and Ventral Body Cavities - Weebly

    The Dorsal and Ventral Body Cavities - Weebly

    The Dorsal Body Cavity and Subcompartments. dorsal body cavity - The closed, membrane-lined sterile anatomical space which houses the central nervous system; its lining are the three connective tissue layers known as the meninges; it is located medially on the...
  • 1) According to the text, a key aspect of personality is its ...

    1) According to the text, a key aspect of personality is its ...

    1) According to the text, a key aspect of personality is its _____ quality. Unique. Ever-changing. Situationally-determined. Continuous. Invariant
  • Thesis Statement and Introduction Paragraphs…

    Thesis Statement and Introduction Paragraphs…

    What is a thesis statement? A sentence or two that serves to guide your argument, making it easier for a reader to understand the information and purpose of your paper or essay. A good thesis statement will usually: take on...