DAMA-Big Data (R)evolution Presentation - DAMA Iowa

DAMA-Big Data (R)evolution Presentation - DAMA Iowa

Big Data - Technical Architecture Roni Schuling - Enterprise Architecture Tom Scroggins IS Domain Architecture Principal Financial Group Big Data - Technical Architecture AGENDA Foundational Definitions & where these technologies came from Big Data

NoSQL Hadoop Business & Technical Drivers How they are being used in many companies Predictions for the future Challenges & Obstacles Questions Big Data - Technical Architecture Foundational Definition Big Data Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for

information. Big data can be characterized by 3Vs: the extreme volume of data, the wide variety of types of data and the velocity at which the data must be must processed. There are many other aspects as well such as: Viscosity, Complexity, Ambiguity. Data in a corporation that cannot be processed using traditional data management techniques and technologies can be broadly classified as Big Data. Big Data - Technical Architecture Big Data - Technical Architecture Big Data Hadoop

Big Data NoSQL Hadoop NoSQL Hadoop & NoSQL are key technologies for working with Big Data effectively. Big Data - Technical Architecture Big Data - Technical Architecture Foundational Definition - NoSQL NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data. NoSQL seeks to solve the scalability and big data performance issues that relational databases werent designed to address. NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that's stored remotely on multiple

virtual servers in the cloud However - NoSQL is not just about Big Data Big Data - Technical Architecture Where this technology came from - NoSQL 2005 2007 abas es Rise of Ob ject D Rela atab ases tiona

l Dat abas e Do mina nce Rise o f Re lation al Da t Flat Files 1970 1980 1990 2000

2010 2014+ Polygot Persistence Document DB Inspired by Lotus Notes Key Value Store Replicate Data during 24x7 Availability Enterprise will have a variety of different data storage technologies for different kinds of

data & application needs Need to Store Tabular Data in Distributed System Many Innovators In The 2005 to 2010 Timeframe Big Data - Technical Architecture Market view of whats out there we do NOT have all of these at PFG today. There are over 150 NoSQL databases in

the market these are just a few of the top ones. Big Data - Data Architecture at PFG Foundational Definition - Hadoop Hadoop is a open source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic

system failure, even if a significant number of nodes become inoperative. Big Data - Data Architecture at PFG Where this technology came from - Hadoop Google publishes Google File System & MapReduce papers Yahoo! Staffs Juggernaut,

open source DFS & MapReduce Doug Cutting builds Nutch DFS & MapReduce, joins Yahoo! 2010 2014+ Juggernaut & Nutch join forces Hadoop is born! Other Internet companies

add tools / frameworks to enhance Hadoop Service providers step into the market provide training, support, & hosting 1995 2005: Yahoo! Search team builds 4+ generations of systems to crawl & index the WWW. 20 Billion pages! 2006

Ad Mas op s tio n 2005 A Int naly er tic op To er o l ab ilit y 2004 En

te rp r Se ise G cu rit rad e y 1995 Big Data - Technical Architecture The Hadoop Vendor Landscape - 2014 Big Data - Technical Architecture

Big Data - Technical Architecture Business Drivers Provide access to all data needed for analytics (internal or external) Provide the ability to realistically interact with greater depths of data IE: tens of years instead of a couple of months Provide a greater speed to insight for all types of requests Lower the total cost of ownership across the enterprise for analytics Allow for exploration of our data in ways we never anticipated to identify differentiating understanding of customers and markets Theres an Imbalance today. Big Data - Technical Architecture

Technical Drivers Current technical capabilities dont align with changing expectations Big Data - Technical Architecture How they are being used today NoSQL Not focused on Big Data.yet Many companies using or at least experimenting with MongoDB Document store for web applications that only needs to persist the content for the lifespan of that interaction. Using NoSQL stores for user preferences to personalize what is presented on a web page for their interaction. Beginning to organization social streams of data

Hadoop Interrogating our web logs to better understand the behavior of people interacting with a website. Merging that semi-structured web activity with other structured legacy data. Massive storage of data for exploration and discovery often using interoperability with analytic consumption tools. Big Data - Technical Architecture NoSQL Plans for the future Database for web applications that need that speed of development

and nimbleness. Layering of NoSQL solutions on top of Hadoop to improve searchability and performance. Exploration of Graph NoSQL solutions for analytics on hierarchical type data . Hadoop Expansion of web activity data (more logs, more data in logs, more use cases.) Speech-to-text translation of Call Recordings and text analysis/Natural Language processing to determine call topics and caller sentiment. Extraction of text from documents to aid in analysis.

Data Lake solutioning both for ingestion and archive. Big Data - Technical Architecture Lake of Data Data Refinery Big Data - Technical Architecture Data Refinery Big Data - Technical Architecture Many Kinds of data in our organization Conceptually for illustration not a vetted/approved picture of the PFG environment Big Data - Technical Architecture Conceptual Workload Isolation Today

Conceptually for illustration not a vetted/approved picture of the PFG environment Big Data - Technical Architecture Conceptual Workload Isolation in the Future Conceptually for illustration not a vetted/approved picture of the PFG environment Big Data - Technical Architecture Big Data - Technical Architecture Big Data technologies are broader than just Hadoop & NoSQL but those are the key starting points for us.

Market view of whats out there we do NOT have all of these at PFG today. Big Data - Technical Architecture Challenges and Obstacles to overcome Security Governance Clear Use Cases

Integration Points Hosting models Big Data - Technical Architecture Q&A [email protected] NoSQL Data Architecture& Best Practices Data View - Overview We are in a Database Revolution Existing paradigms are being challenged o Models o Hardware o Software o Languages

Will tweaking current data solutions be enough? NoSQL Data Architecture& Best Practices Data View - Overview NoSQL Data Architecture& Best Practices Data View Five Data Paradigms NoSQL Data Architecture& Best Practices Data View Five Data Paradigms Relational Model PROs

Most flexible queries & updates Reuse data structures in any context Great DB-to-DB integration Mature tools Standard query language Easy to hire expertise CONs

Design-time, static relationships Design-time, static structures: design first then load data Hard to normalize model Requires code to integrate relational data with object-oriented code Cannot query for relevance NoSQL Data Architecture& Best Practices Data View Five Data Paradigms Dimensional Model PROs

Queries facts in context Self-service, ad hoc queries High-performance platforms Mature tools and integration Standard query language Turns data into information CONs Expensive platforms Design-time, static relationships Design-time, static structures: design first then load data

Cannot query for relevance Cannot query for answers that are not built into the model NoSQL Data Architecture& Best Practices Data View Five Data Paradigms Whats wrong (aka challenging) with SQL DBs? Relevance Velocity Volume Variety Variability NoSQL Data Architecture& Best Practices Data View Five Data Paradigms Key Value / Column Family Models

PROs Fast puts and gets Massive scalability Easy to shard & replicate Data colocation Simple to model Inexpensive Data in transactional context Developer in control

CONs Carefully design key Shred JSON into flat columns Secondary indexes required to query outside of hierarchical key No standard query API or language Hand code all joins in app Immature tools and platform Hard to integrate and hire NoSQL Data Architecture& Best Practices

Data View Five Data Paradigms Document Model PROs Fast development Schemaless, run-time designed, rich, JSON and/or XML data structures Queries everything in context Self-service, ad hoc queries Turns data into information Can query for relevance

CONs Defensive programming for unexpected data structures Expensive platforms, immature tools, and hard to integrate Non-standard Query Languages, and hard to hire expertise Not as fast as Column-Family / Key-Value databases NoSQL Data Architecture& Best Practices Data View Five Data Paradigms Graph Model PROs

Unlimited flexibility model any structure Run time definition of types & relationships Relate anything to anything in any way Query relationship patterns Standard Query Language (SPARQL) Creates maximum context around data CONs

Hard to model at such a low level Hard to integrate with other systems Immature tools Hard to hire expertise Cannot query for relevance because original document context is not preserved NoSQL Data Architecture& Best Practices Data ViewData FiveView Data Paradigms .. Whats wrong (aka challenging) with NoSQL DBs? Developer responsible

for consistency (handle threading) Locks Contention Serialization Dead Locks Race Conditions Threading Bugs NoSQL Data Architecture& Best Practices Data ViewData FiveView Data Paradigms NoSQL Data Architecture& Best Practices

Data View Modeling Takeaways Each model has a specialized purpose Dimensional Business intelligence reporting and analytics Relational Flexible queries, joins, updates, mature, standard

Column / Key-Value Simple, fast puts and gets, massively scalable Document Fast Development, schemaless JSON/XML, searchable Graph / RDF Modeling anything at runtime including relationships NoSQL Data Architecture& Best Practices

Data View Data HowView do you choose? .. How do you choose? How much Durability do you need? Durable data survives system failures & can be recovered after unwanted deletion How much Atomicity do you need? An atomic transaction is all or nothing, sets of data and/ or sets of commands. How much Isolation do you need? Isolation prevents concurrent transactions from affecting each others. How much Consistency do you need (or when do you need it)? Consistency exists when data is committed and

consistent with all data rules at a point in time. NoSQL Data Architecture& Best Practices Data ViewData HowView do you choose? .. Durability Can you live with writing advanced code to compensate? o Trusting all developers to properly check for partial transaction failures, current physical layout of the data

cluster, and write code to propagate data across the cluster. Can you live with lost data? o No logs, archives, mirroring, etc. Can you live with accidental deletion of data? o No point in time recovery feature Can you live with scripting your own backup & recovery solutions? NoSQL Data Architecture& Best Practices Data ViewData HowView do you choose? .. Atomicity Can you live with modifying single documents at a time? Can you live with partially successful transactions?

o You can achieve higher availability because transactions can partially succeed. Can you live with inconsistent and incomplete data? o Is it OK to not know when data anomalies are caused by bugs in your code or are temporarily inconsistent because they havent been synchronized yet? Can you live with writing advanced code to compensate? o Custom solutions for atomic rollback, handling of transactions that fail, find & fix inconsistent data. NoSQL Data Architecture& Best Practices Data ViewData HowView do you choose? .. Isolation Can you live with modifying single documents at a time?

Can you live with inaccurate queries? o Without isolation, query results are inaccurate because concurrent transactions can change data while processing it. Can you live with race conditions and dead locks? Can you live with writing advanced code to compensate? o Your own versioning system, code to hide concurrent updates, inserts and deletes from queries, handle race conditions and deadlocks. NoSQL Data Architecture& Best Practices Data ViewData HowView do you choose? .. Consistency - Do you need complete consistency?

Not necessarily instead, you may prefer: Absolute fastest performance at lowest hardware cost Highest global data availability at lowest hardware cost Working with one document at a time Writing advanced code to create your own consistency model Eventually consistent data Some inconsistent data that cant be reconciled

Some missing data that cant be recovered Some inconsistent query results NoSQL Data Architecture& Best Practices Data ViewData HowView do you choose? .. What do you need most? Highest performance for queries and transactions Highest data availability across multiple data centers Less data loss (eg. Durability) More query accuracy & less deadlocks (eg. Isolation) More data integrity (eg. Atomicity) Less code to compensate for lack of ACID compliance

NoSQL Data Architecture& Best Practices Key Points RDBMs will always have an important place in our architecture. NoSQL implementations have a benefit to our future. Once you have a list of NoSQL databases that meet your modeling needs, choose the one that best meets your need for velocity and volume. It is not a one-or-the-other all in choice to make.

Recently Viewed Presentations

  • What is Engineering?

    What is Engineering?

    Building a Solid Foundation for Your Engineering Career Dr Sam Man Keong (岑文强) CEng, FIET, FSIET(F) Honorary Secretary Singapore Institute of Engineering Technologists
  • Chapter 8 Concepts of Chemical Bonding

    Chapter 8 Concepts of Chemical Bonding

    Lewis Structures. Sharing electrons to make covalent bonds can be demonstrated using Lewis structures. We start by trying to give each atom the same number of electrons as the nearest noble gas by sharing electrons. The simplest examples are for...
  • Le Pere Lachaise _ 2

    Le Pere Lachaise _ 2

    Corot a bâti une œuvre assez riche et variée pour avoir touché à tous les courants de l'époque. **** Corot Gilbert Bécaud ( 1927-2001 ) Il fut chanteur compositeur, pianiste et acteur. Sa célèbre cravate à pois qui était son...
  • Proces formułkowy

    Proces formułkowy

    The establishment of the legacy was carried out in a formal way with certain words, strictly defined by law Legacy burdened only testamentary heir, so it had to be placed in will. Prezentacja programu PowerPoint Uniwersal trust Three systems of...
  • SDO Systems Retreat - Stanford University

    SDO Systems Retreat - Stanford University

    SDO Overview Ken Schwer Project Manager Louis Demas Chair Jeff Jones Deputy Chair Thomas J. Sutliff Programmatics Richard E. Snyder Programmatics Louis R. Ignaczak Operations and Communications TBD Science Mark K. Jacobs Cost Analysis Steven M. VanHooser Schedule Analysis Beth...
  • DiversityInc Dell Benchmarking

    DiversityInc Dell Benchmarking

    John Miller. PM - Juliette O'Connor. EMEA. Exec Chair - Therese Cooney / Gustavo Ripoll. PM - Jose Echeverria. LATAM. Exec Chair - Eric Clark. PM - Kelsi Enos. Central Texas Exec Chair- Lou Mabley. PM - Dineen Mansfield &...
  • Reports - Alberta

    Reports - Alberta

    Unit Agreement Exhibit A. Reports. ... A Unit Operator or WIO can submit a Report Request provided they have an ETS Account . REPORT DISCLAIMER. ... REQUEST AT SUBMITTED STATUS. Click out of the . Report. node on the menu...
  • Tiltrotor Tactical Formation Maneuvering Click to Add Instructor

    Tiltrotor Tactical Formation Maneuvering Click to Add Instructor

    It is now time to learn a new formation and technique for maintaining that formation. It certainly is not the best formation all the time. But, when METT-TSL dictates that it is the best formation, you had better have the...