PowerPoint 演示文稿 - VLDB

PowerPoint 演示文稿 - VLDB

CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor Outline Motivation Interface Algebra Guidance Feature Storage Evaluation Motivation Non-technical people directly exposed to dat a. Hard to design a schema in advance. Start with a simple structure and grow it as needed. We call this process organic schema evolutio n

Motivation Contd While users have the freedom of organically growi ng their schema, the data is now subject to denor malization. Consequently, users have to explicitly deal with d uplicated data entries, which may produce errors t hat violate integrity constraints. Therefore, an organic database system must: Make it easy for the end user to make schema changes Guarantee efficient and safe data entry Implement these features with low cost Challenges Schema Update Specification Data Migration Data Entry Schema Evolution Performance Outline

Motivation Interface Algebra Guidance Feature Storage Evaluation Spreadsheet? Flat spreadsheets v.s. Hierarchical semantics Name City Address Mary Chicago 2364 Bishop KeithName

City Ann Arbor 101Address Plymouth Keith Ann Arbor 202 Main ID Name City 1 Mary Chicago 2 Keith Ann Arbor Person

ID Address 1 2364 Bishop 2 101 Plymouth 2 202 Main Address How to support hierarchical semantics? We permit nesting! Name City Mary

Chicago Keith Ann Arbor [Address] Address 2364 Bishop 101 Plymouth 202 Main Span Table Span Table: a next-generation spread sheet that nests data in a single repre sentation: schem dat a a Specify an evolution by dragging StateName inside Address Specify an evolution by dragging Person upward. Outline

Motivation Interface Algebra Guidance Feature Storage Evaluation Data Migration in Schema Evolution Data needs to be migrated from the old schema to the new one. May involve data copy/merge. Users need to edit in a cell-by-cell manner. Name City Address Mary Chicago

2364 Bishop Keith Ann Arbor 101 Plymouth Keith Ann Arbor 202 Main Name City [Address] Address Mary Chicago Keith Ann Arbor 2364 Bishop 101 Plymouth

202 Main Introducing Operators! Schema restructuring operators: Extended spreadsheet operators: IMPORT, EXPORT, FLOAT, SINK Schema modification: Adding/Dropping Columns Data manipulation: Inserting/Deleting/Updating Tuples Collectively, we call this set of operators Span Tabl e Algebra. Span Table Algebra: Schema Restructuring Operators Operator Description Import(A)

Move A inward into a descendant relation. Export(A) Move A outward into an ancestor relation. Sink(A) Push A to create a new leaf relation. Float(A) Lift A to create a new intermediate level. Name Sink(Addres Import(Cit Export(Cit s) y) y) City [Address] Address [Address] Mary City Chicago Address

2364 Bishop Mary Keith Chicago Ann Arbor 2364 Bishop 101 Plymouth Keith Keith Ann Arbor Ann Arbor Ann Arbor 101 202 Plymouth Main 202 Main Span Table Algebra: Expressive Power Analysis Import and Export etc. can be expressed in terms of Nest an d Unnest:

Nest and Unnest can be expressed as a sequence of Span Ta ble Operators: Detailed proofs in paper appendix. Outline Motivation Interface Algebra Guidance Feature Storage Evaluation Inevitable Denormalization

Traditional design uses data integrity constraints We can not do this since we have no pre-defined c onstraints Denormalization A B C a0 b0 c0 a0 b1 b0 c1 FD: A B Guide User Data Entry We maintain a set of soft functional depe ndencies (FDs) to guide user data entry:

Inductive completion Error prevention (1) rollback (2) also update relevant entries to preserve data integrity (3) force the entry and update the soft FDs. ID Name Course Grade 1 Peter Math A 2 Peter

Physics A 3 Leo Math B C Leo FD: Name Grade C B FD: Name, Course Grade How to Manage FDs? Frequent data entry Frequent FD re-induction Past solution too expensive to be applied

Incremental FD Induction (IFDI): Induce Initial FDs and maintain important data structures. Maintain these structures and incrementally re-induce FD s. We optimize the way to update these structures so that th e algorithm is able to respond in real time. Outline Motivation Interface Algebra Guidance Feature Storage Evaluation Vertical Partitioning

Span tables are vertically partitioned and stored in relational databases. Connecting span table to underlying storage: Upward mapping Downward mapping Outline Motivation Interface Algebra Guidance Feature Storage Evaluation Evaluation: Our experiments are designed to answer four questions: Span

Table usability Guidance feature usability IFDI efficiency Storage performance Evaluation: User Study on Schema Operations Tasks: Measure: Schema Design: Create the schema for an address book. Schema Update: Move an attribute from one relation to another in a gene database. Time to complete each task. Compared against SSMS (MS SQL Server Management Studi o 2008). All users failed in this task using SSMS since they were unable to migrate the data manually. In

contrast, all of them were able to complete the task within seconds with CRIUS. Schema Design Schema Update Evaluation: User study on Integrity-Based Guidance The three tasks: Insert a new contact and his ad dress into the address book. Update the cell phone number o f one contact. Update the address of one cont act to the address of another co ntact. Measure: time to complete each task, and overall count of key strokes/mo

use clicks. Compare with and without the gui dance feature on. Conclusion The design and implementation of CRIUS Span table algebra Integrity-based guidance based on IFDI Storage Evaluation ? ? Question s IFDI: Inducing Initial FDs ID Name Cours e Grade 1

Peter Math A 2 Peter Physic s A 3 Leo Math B 4 Leo Physic s B

5 Jack Partitions: Math A Attribute PN = {(1,2), (3,4), (5)} PC = {(1,3,5), (2,4)} PG = {(1,2,5), (3,4)} PNC = {(1), (2), (3), (4), (5)} PNG = {(1,2), (3,4), (5)} X Y iff PX = PCG = {(1,5), (2), (3), P XUY (4)} PNCG = {(1), (2), (3), Attribute Lattice: {(1,2), (3,4)} {(1,3,5), (2,4)} N C

{} NC {(1,2), (3,4)} NG {(1,2,5), (3,4)} G {(1,5)} CG {} PXUY = PX PY NCG N G since PN = PNG NC G since PNC = PNCG (dominated by the above) IFDI: Maintaining FDs on Value Update Name Cours e

Grade 1 Peter Math A 2 Peter Physic s AB 3 Leo Math B 4 Leo

Physic s B 5 Jack Math A Attribute Lattice: {(1,2), (3,4)} {(1,3,5), (2,4)} N C G {(1,2,5), (3,4)} ID

{(1,5), (2,3,4)} {} {(1,5)} Attribute Partitions: NC NG CG {(1,5), (2,4)} PN = {(1,2), (3,4), (5)} {(1,2), (3,4)} PC = {(1,3,5), (2,4)} {(3,4)} PG = {(1,2,5), (3,4)} PG = {(1,5), (2,3,4)}{} NCG {} PNC = {(1), (2), (3), (4), (5)} Only visit half PNG = {(1,2), (3,4), (5)} PNG = {(1), (2), (3,4), of the lattice (5)} nodes! PCG = {(1,5), (2), (3), (4)} PCG = {(1,5), (2, 4), (3)} N G no longer holds since PN X Y iff

P = PNCG = {(1), (2),X (3), (4), (5)} PNCG PNG= {(1), (2), (3), (4), (5)} PXUY NC G since PNC = PNCG IFDI: Maintaining FDs on Value Update Contd How do we efficiently update attribute partitio P = {(1,5), (2), (3), (4)} P = {(1,5), (2, 4), (3)} when tuple 2 ns? CG CG is updated. Naively re-computing product: P =P P

CG C G PC = {(1,3,5), (2,4)} PG = {(1,2,5), (3,4)} S S111 = = {1,5} {3} {} S S222= = {2} {4} {} PPCGPCGCG === {(1,5), {(1,5), {} (2), (2)}(3), (4)} PCG = PC PG PC = {(1,3,5), (2,4)} PG = {(1,5), (2,3,4)} PCG = {(1,5), (2, 4), (3)} Incrementally update product:

PCG = Update (PCG , PC , PG , PCGtid) = {(1,5), (2), (3), (4)} tid = 2 PC = {(1,3,5), (2,4)} PG = {(1,5), (2,3,4)} 1) Remove tuple from the old group: 2) Add tuple to the new group: P PCG ={(1,5), {(1,5),(2), (2, (3),4), (3), (4)} (3)} (4)} CG= Evaluation: User Study on Schema Operations Contd Task: Measure:

move an attribute across relations in a gene database (the same as before). time to complete the task. Compare CRIUS with a strawman system with only nested relationa l operators. Evaluation: Performance of IFDI Task: Measure: Re-generate the minimal FDs on value update. The time to complete the task. Compare IFDI with the naive algorithm. a five-column table with varying row size

a ten-thousand-row table with varying column size. Evaluation: Performance of Vertical Storage Tasks: Execute an schema update. Load data from the relational back-end and construct a span table. Measure: Time to complete each task. Compare CRIUS with the naive storage ms MB Time to move an attribute with varying DB size.

Time to display data with varying DB size.

Recently Viewed Presentations

  • Finding and Citing Research

    Finding and Citing Research

    Use transitions and use your own words to introduce the quotations and sources. USING SOURCES IN YOUR ESSAY Attributive tags: Sometimes information that is taken from a source can seem like your own words when attributive tags are left out....
  • Contract Quality PDS, and SPS

    Contract Quality PDS, and SPS

    PDS Initiative. AT&L tasked DPAP (June 5, 2007) to develop a procurement data strategy for DoD. The Procurement Data Standard (PDS) is a system-independent data standard that is intended to be adopted and implemented DoD-wide for creation, translation, processing, and...
  • Cost-Efficient Memory Architecture Design of NAND Flash Memory

    Cost-Efficient Memory Architecture Design of NAND Flash Memory

    case handling ~ /cache miss handling ~ practical problem for mobile systems (e.g. Cell phones)-time-critical interrupt handling e.g. call processing.-e.g. if interrupt during cache miss handling->system can miss deadline / lose data or connection. 3. Bad . block . management...
  • USING PARAGRAPHS - Carre's Grammar School

    USING PARAGRAPHS - Carre's Grammar School

    USING PARAGRAPHS TO IMPROVE OUR WRITING What are paragraphs? A paragraph is a group of sentences about one main idea. Why do we use paragraphs? They break up long chunks of writing, making it clearer and more interesting for the...
  • sex and gender - Quia

    sex and gender - Quia

    She went public after the photo surfaced. Hamilton, a New York resident who is half-Swedish and was raised in France, has been looking for another job since she was let go in April, said Jesse Derris, her spokesman at Sunshine...
  • EXAMINATION OF WITNESSES - Kane Russell Coleman Logan PC

    EXAMINATION OF WITNESSES - Kane Russell Coleman Logan PC

    Avoid forcing the jury to read lengthy, complicated or convoluted paragraphs of prose. They don't like it. Use documents to punctuate testimony, to establish timeframes, sequence of events, etc. Better for the story to be told out of the mouths...
  • Andes Tutoring: Freedom, Support, and Accelerated Learning as

    Andes Tutoring: Freedom, Support, and Accelerated Learning as

    Andes Tutoring: Freedom, Support, and Accelerated Learning as Students Solve Complex Physics Problems Kurt VanLehn & Brett van de Sande School of Computing, Informatics and Decision Engineering
  • Mendelian Genetics - Biology

    Mendelian Genetics - Biology

    Studying human genetics. You cannot make humans of different types breed together. Pedigree charts offer an ethical way of studying human genetics . Today genetic engineering has new tools to offer doctors studying genetic diseases . A genetic counsellor will...