DBI-325 Do You Have Big Data? (Most Likely!)

DBI-325 Do You Have Big Data? (Most Likely!)

DBI325 Do You Have Big Data? (Most Likely!) Brian Mitchell Lead Senior Consultant Microsoft Big Data COE Brian by the Numbers 14+ Years working with SQL Server 7+ Years at Microsoft Microsoft Certified Master SQL Server 2008 Analysis Services Maestro SSAS 2008

3+ Years working with SQL Server PDW Facebook Big Data Research 68% Brian 30% Bryan 2% Other Blog: http://brianwmitchell.com (#1 blog on SQL PDW) Twitter: @brianwmitchell (Woeful mid-200s followers) Session Objectives And Takeaways Session Objective(s):

Examining the three Vs of Big Data Common Big Data Sources Common Big Data Scenarios The Transition to Big Data Va r ie ty

fk/ pk Volume big pull PDW SQL Server

Velocit y k/v smal l pus h

HDInsight Defining the Transition Volume Velocity Variety Old World Order

Hans Lipperhey New World Order Old World Order New World Order CAPITALIZING MEANS USING BIG DATA - BUT WHAT IS IT?

Petabyte s Click stream Sensors/ RFID/ devices Wikis/blogs

Terabyte s Gigabyte s Megabyte s Advertisin

g Collaboratio n Mobile eCommerce Social sentiment

Audio/video Web 2.0 Big Data Log files Spatial & GPS coordinates

Web Logs Digital Marketing eGov feeds Payables ERP/ CRM Contacts

Data market feeds Search Marketing Weather Payroll Deal Tracking

Text/image Inventor y Sales Pipeline Recommendation s

Data Complexity: Variety and Velocity Big Data is. Not the Size of the Data Not the cool new tools like Hadoop and R A New Paradigm on How to Collect and Use Data Differently. Traditional DW/BI Environment

ETL Data Warehouse Transactional OLAP Backroom/Data Warehouse Reportin g

Statements customers may be saying We need to parallelize data operations but its too costly & complex The business cant get access to all the relevant data, we need external data. We cant match customer master data to live customer interactions. We cant force everything into a star-schema Our BI reports and charts dont tell us anything we didnt know. We are missing the ETL window, the data we needed didnt arrive on

time We cant predict with confidence if we cant explore data & develop our own models Type of data generated by sector Why is Big Data Important Companies are Inefficient Fundamentals

Common Big Data Sources Telematics Text Time and Place RFID Smart-Grid

Sensor Telemetry Social Networks Common Big Data Algorithms c Finding Similar Items

Mining Data Streams Link Analysis Frequent Item Sets Clustering Advertising on the Web

Recommendation Systems Mining SocialNetwork Graphs Finding Similar Items Similar Web Pages Collaborative Filtering

Mining Data Streams Sensor Data Images Data Internet and Web Traffic Frequent Item Sets

Market Basket Analysis Plagerism Market Basket Analysis BioMarkers

Related Concepts Clustering Recommendation Systems Content-based Systems Collaborative Filtering Systems

Recommendation Systems The Utility Matrix LOTR1 A 4 B 5

LOTR2 E BB TWD 1 5

C D LOTR3 TW1 TW2 TW3

5 4 2 1 5 3

3 5 2 Recommendation Systems Applications Product Recommendations

Movie Recommendations News Articles Recommendation Systems Populating the Utility Matrix Ask the User

Inferences from Users Behavior Mining Social Network Graphs Social Networks Social Networks Email Networks

Telephone Networks Making it Real Putting it all together: Pig, Hive, Sqoop, SQL Azure, PowerPivot, PowerView Brian Mitchell Data Mining R and SQL Server

R Programing Language R includes an effective data handling and storage facility a suite of operators for calculations on arrays, in particular matrices a large, coherent, integrated collection of intermediate tools for data analysis graphical facilities for data analysis and display either on-screen or on hardcopy conditionals, loops, user-defined recursive functions

and input and output facilities. R Graphing Capabilities R & SQL Server Brian Mitchell Social Network NodeXL Analysis

NodeXL With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window. http:// nodexl.codeplex.com/r eleases/view/104762

NodeXL Brian Mitchell Predictive Analytics Predictive Analytics The alternative to thinking ahead would be to think backwards . . . and thats just remembering

Dr. Sheldon Cooper The Big Bang Theory Analytics Maturity Performanc e Analytics Predictive Analytics Prescriptive

Analytics Big Data Solutions drives Analytics Solutions Maturity 2013 PROS, Inc. All rights Traditional DW/BI Environment ETL Data Warehouse

Transactional OLAP Backroom/Data Warehouse Reportin g Tomorrows DW/BI Environment

ETL Transactional New Data Sources Social Networks Sensor Data Log Data

RFID Data Automated Data Data Warehouse HDInsight Business Critical OLAP

Backroom/Data Warehouse Reportin g Related Sessions on Big Data DBI-B339 Predictive Analytics with Microsoft Big Data Val Fontama and Saptak Sen Tuesday June 4 @ 3:15PM

DBI-B401 Enriching Big Data for Analysis Adam Jorgensen and Lara Rubbelke Thursday June 6 @ 8:30AM DBI-B334 Data Management in Microsoft HDInsight: How to Move and Store Your Data Mike Flasko Thursday June 6 @ 1:00PM Questions? Track Resources

Download Data Explorer Windows Azure Download Geoflow SQL Server Website

mva Microsoft Virtual Academy Hands-On Labs Get Certified! @sqlserver

Resources Learning Sessions on Demand http://channel9.msdn.com/Events/TechEd TechNet Resources for IT Professionals http://microsoft.com/technet

Microsoft Certification & Training Resources www.microsoft.com/learning msdn Resources for Developers http://microsoft.com/msdn Complete an evaluation on CommNet and enter to win!

Evaluate this session Scan this QR code to evaluate this session and be automatically entered in a drawing to 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Recently Viewed Presentations

  • Historical Fiction - Teaching Canadian Literature in ...

    Historical Fiction - Teaching Canadian Literature in ...

    Tales from Gold Mountain (Paul Yee, 1989) Anchored in the railway-building era, these novels bring the reader into the twentieth century and toward a flowering of Asian writing, especially that of the multi-generation family saga often charting the family's identity...
  • Thought of the Day Thursday, October 15, 2015

    Thought of the Day Thursday, October 15, 2015

    Problem and Solution. Problem and solution. Here is the . problem. After the fire, thousands of people were left homeless. Many escaped the fire with nothing except the clothes on their backs. Providing all of these people with food, clean...
  • Chapter 31: Disorders of Ventilation and Gas Exchange

    Chapter 31: Disorders of Ventilation and Gas Exchange

    Hypoxemia results from . An inadequate O. 2. in the airDisease of the respiratory system. ... Increased arterial PCO. 2 Caused by hypoventilation or mismatching of ventilation and perfusion. Effects. Acid-base balance (decreased pH, respiratory acidosis)
  • Linking Verbs - Oxford School District

    Linking Verbs - Oxford School District

    A linking verb connects the subject of a sentence with a word in the predicate. It makes a statement. It does not show action. Ex. The girl is cold. What are some linking verbs? Most common linking verbs: Be (is,...
  • Array - 2130703 - Data Structure - Darshan Institute of ...

    Array - 2130703 - Data Structure - Darshan Institute of ...

    Sparse matrix. An mXn matrix is said to be sparse if "many" of its elements are zero. A matrix that is not sparse is called a . dense matrix. We can device a simple representation scheme whose space requirement equals...
  • Frank-Wolfe optimization insights in machine learning

    Frank-Wolfe optimization insights in machine learning

    Frank-Wolfe optimization insights in machine learning. Simon . Lacoste-Julien. INRIA / École Normale Supérieure. SIERRA Project Team. SMILE- November 4th 2013
  • Unified Code for Units of Measure (UCUM)

    Unified Code for Units of Measure (UCUM)

    Unified Code for Units of Measure (UCUM) Gunther Schadow, MD, PhD, Investigator, Regenstrief Institute Associate Professor, Indiana University School of Informatics Co-chair, Orders and Observations, Health Level-7 President, Pragmatic Data LLC
  • Three Dimensional Computational Model of Water Movement in ...

    Three Dimensional Computational Model of Water Movement in ...

    Three Dimensional Computational Model of Water Movement in Plant Root Growth Zone Brandy Wiegers1,2, Dr. Angela Cheer2, Dr. Wendy Silk3 1 [email protected] 2 Department of Mathematics, University of California, Davis 3 Department of Land, Air, and Water Resources