Automatic Database Integration - University of British Columbia

Automatic Database Integration - University of British Columbia

INTEGRATION Ramon Lawrence University of Iowa [email protected] USING UNITY Ken Barker University of Calgary [email protected] Summary The Unity prototype tackles the schema integration problem by constructing an integrated, global view in a bottom-up approach.

The extraction process, which is semi-automatic in nature, is separated from the integration process. Constructing a global view in this manner requires describing data source semantics using a dictionary and a XML-based language. Thus, the integration process is automatic, and there is no requirement for a global human integrator. Systematic naming using a dictionary allows global queries to be graphically constructed without specifying joins between global relations.

The global view produced demonstrates properties similar to a dynamically constructed Universal Relation. Benefits and Contributions The architecture automatically integrates relational schemas into a global view for querying. Unique contributions:

Synthesizing a global view from the bottom-up instead of top-down improves integration scalability. Organizing the global view as a hierarchy of concepts instead of relations or predicates simplifies querying as the user does not have to specify specific relations or join conditions. This is called Querying by Context (QBC). Query processing is achieved by dynamically discovering extraction rules based on the naming of fields and tables. The discovered rules are similar to the extraction rules of globalas-view (GAV) systems. Unity Overview Unity is a software package that performs bottomup integration with a GUI.

Developed using Microsoft Visual C++ 6 and Microsoft Foundation Classes (MFC). Unity allows the user to: Construct and modify standard dictionaries. Build X-Specs to describe data sources including extraction of metadata using ODBC and mapping system names to dictionary terms. Integrate X-Specs into an integrated view. Transparently query integrated systems using ODBC and

automatically generate SQL queries. Architecture Components The architecture consists of four components: A standard dictionary (SD) to capture data semantics SD terms are used to build semantic names describing semantics of schema elements. X-Specs for storing data source descriptions Relational database info. stored and transmitted using XML. Stores semantic names to describe schema elements.

Integration Algorithm Identical concepts in different databases are identified by similar semantic names. Produces an integrated view of all database concepts. Query Processor Allows the user to formulate queries on the view. Translates from semantic names in integrated view to SQL queries and integrates and formats results. Involves determining correct field and table mappings and discovery of join conditions and join paths.

Querying by Context (QBC) Querying by context (QBC) is a methodology for querying relational databases by semantics. Query by Context performs dynamic closure relating concepts for the user as they browse the integrated view.

Querying is performed by selecting semantic names that represent query concepts from the integrated view. The integrated, context view contains all concepts present in the databases referenced by semantic names. This allows a limited form of recursive queries and eliminates the need for the user to specify joins. The query processor maps the users selections and criteria to an actual SQL query. References Publications:

Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, Jan. 2000. Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages 127-136, Oct. 2000. Integrating Relational Database Schemas using a Standardized Dictionary, SAC2001 - ACM Symposium on Applied Computing, pages 225-230, March 2001. Querying Relational Databases without Explicit Joins DASWIS 2001- International Workshop on Data Semantics in Web Information Systems (with ER'2001), Nov. 2001. Further Information:

http://www.cs.uiowa.edu/~rlawrenc/ Integration Example BodyWorks Systems Customer Web Server Order Database Invoice Database Custom Accounting

Package Shipment Tracking Software Shipment Database Bodyworks is a fictional company with 3 legacy databases that must be integrated for management reporting. Query-Driven Data Extraction Integrated Context View Unity Software X-Spec

Editor Standard Dictionary Integration Algorithm Query Processor and ODBC Manager ODBC Querying Invoice Database Order Database Shipment

Database Integration Processes Integration is performed with 3 separate processes: Capture process: independently extract database schema information into a XML document called a X-Spec. This process is a semi-automatic description using a dictionary. Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view. This process performs automatic name matching, but

imprecision may occur. Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) , executed using ODBC, and the results are combined using global keys. Users do not have to specify joins when querying the global view. The Unity Prototype What is the open problem? The GAV and LAV approaches are both viable methods for solving data integration.

However, the open problem is that neither approach performs schema integration - the construction of the global view itself. GAV - GV constructed (schema integration performed) by global designer when specifying extraction rules. LAV - GV is pre-defined using some previous integration process (most likely manual in nature). Both methods rely on the concept of a global user to create the global schema. How Unity is Different

Our integration architecture called Unity is different because it approaches the integration problem from a different perspective: How can we automate, or semi-automate, the construction of the global view by extracting information from the local data sources? Thus, the integration problem is tackled from a different set of starting assumptions: Do not assume pre-existing or manually created GV.

However, assume we have a dictionary and a language for describing schema and data element semantics. Attempt to automatically build a GV from source descriptions of each data source. The Unity Approach Given a set of data sources and a dictionary and a language to describe data semantics: 1) Semi-automatically extract and represent data source semantics in the language using the dictionary. 2) Automatically match concepts across data sources by using the dictionary to determine related concepts. This process effectively builds the global level relations or

objects initially assumed or created in other approaches. However, since there is no manual intervention, the precision of global view construction is affected by inconsistencies in the descriptions of the data sources and matching concepts. 3) Automatically generate queries specified by the user using dictionary terms (not structures) and map the user's query to appropriate data elements in the local sources. What is wrong with SQL? There is nothing wrong with SQL. However, SQL is not a simple query language for many reasons:

Querying by structure does not hide complexities introduced due to database normalization. Structures (fields and tables) may be assigned poor names that do not adequately describe their semantics. Notion of a join is confusing for beginner users especially when multiple joins are present. SQL forces structural access which does not provide logical query transparency and restricts logical schema evolution. Querying multiple databases (without a global view) using SQL-variants is complex because naming and structural conflicts must be resolved during query

formulation.

Recently Viewed Presentations

  • www.hollandcsd.org

    www.hollandcsd.org

    Star Math PreTest Instructions. Instructions take about 5 mins to go through. See the PreTest Instructions on the "Resources" page of the Star software (log in as a teacher) Read the notes here word for word. To print this file...
  • www.adrants.com

    www.adrants.com

    Tuesday, April 4, 2000 Table of Contents Overview Parameters Budget Timing Geographic Coverage Target Audience Review Objectives Strategies Tactics Digital 1/0 Media Selection Pro
  • Headline 24pt Arial Bold

    Headline 24pt Arial Bold

    Chris Austin, Rania Uwaydah Mardini, Suzie Webb, Adrian Pulham. June 2019 / IAESB. Agenda. Handover Notes. Strategic Advice. IFAC New Model for Accounting Education. IFAC New Model for Accounting Education. Downloading… IAESB Knowledge. Handover Pack. June 2019 / IAESB.
  • Shawnna Childress, Cognitive Legal Co-Leader

    Shawnna Childress, Cognitive Legal Co-Leader

    Shawnna Childress, Cognitive Legal Co-Leader & Global Business Advisor, Cognitive & Analytics Center of Competency, IBM. Brian Kuhn, ... GuruduthBanavar, Chief Science Office, Cognitive Computing, and VP, IBM Research. Everything you think you know about A.I. is wrong.
  • Byzantium Notes - Cisd

    Byzantium Notes - Cisd

    b. military virtues impeded development of more stable, centralized government. 9. military feudalism survived the feudal eras. a. Japan had trouble controlling Samurai class. b. West could not rid itself of the warrior ethic that the central purpose of the...
  • New Flexible FR Polyurethane Foams for Energy Absorption

    New Flexible FR Polyurethane Foams for Energy Absorption

    Kelvin K. Shen, Ph.D. Dr. Kelvin Shen is a technical and marketing consultant for fire retardant chemicals. His last industrial position was Sr. Global Market Development Manager of Fire Retardant Industry at Rio Tinto Minerals (former U.S. Borax/ Luzenac). He...
  • FINANCE BILL 2019 - baroda-icai.org

    FINANCE BILL 2019 - baroda-icai.org

    Goods and Services Tax. Procedure for Registration [Sec. 25] Undergo authentication or furnish Aadhaar No. Facility of Digital Payment [Sec. 31A] Online Invoicing - < Rs.5 Crores
  • Default

    Default

    Power supply options, system structure, motor integrated drive, near motor drive, drive connection box, hybrid cable, accessories. Possibilities with IndraDrive Mi. Creation of Mi lines, communication link, Motion Logic, decentralized peripherals, safety technology, slip ring. Advantages & Customer Benefits