DBI325 Do You Have Big Data? (Most Likely!) Brian Mitchell Lead Senior Consultant Microsoft Big Data COE Brian by the Numbers 14+ Years working with SQL Server 7+ Years at Microsoft Microsoft Certified Master SQL Server 2008 Analysis Services Maestro SSAS 2008
3+ Years working with SQL Server PDW Facebook Big Data Research 68% Brian 30% Bryan 2% Other Blog: http://brianwmitchell.com (#1 blog on SQL PDW) Twitter: @brianwmitchell (Woeful mid-200s followers) Session Objectives And Takeaways Session Objective(s):
Examining the three Vs of Big Data Common Big Data Sources Common Big Data Scenarios The Transition to Big Data Va r ie ty
fk/ pk Volume big pull PDW SQL Server
Velocit y k/v smal l pus h
HDInsight Defining the Transition Volume Velocity Variety Old World Order
Hans Lipperhey New World Order Old World Order New World Order CAPITALIZING MEANS USING BIG DATA - BUT WHAT IS IT?
Petabyte s Click stream Sensors/ RFID/ devices Wikis/blogs
Terabyte s Gigabyte s Megabyte s Advertisin
g Collaboratio n Mobile eCommerce Social sentiment
Audio/video Web 2.0 Big Data Log files Spatial & GPS coordinates
Web Logs Digital Marketing eGov feeds Payables ERP/ CRM Contacts
Data market feeds Search Marketing Weather Payroll Deal Tracking
Text/image Inventor y Sales Pipeline Recommendation s
Data Complexity: Variety and Velocity Big Data is. Not the Size of the Data Not the cool new tools like Hadoop and R A New Paradigm on How to Collect and Use Data Differently. Traditional DW/BI Environment
ETL Data Warehouse Transactional OLAP Backroom/Data Warehouse Reportin g
Statements customers may be saying We need to parallelize data operations but its too costly & complex The business cant get access to all the relevant data, we need external data. We cant match customer master data to live customer interactions. We cant force everything into a star-schema Our BI reports and charts dont tell us anything we didnt know. We are missing the ETL window, the data we needed didnt arrive on
time We cant predict with confidence if we cant explore data & develop our own models Type of data generated by sector Why is Big Data Important Companies are Inefficient Fundamentals
Common Big Data Sources Telematics Text Time and Place RFID Smart-Grid
Sensor Telemetry Social Networks Common Big Data Algorithms c Finding Similar Items
Mining Data Streams Link Analysis Frequent Item Sets Clustering Advertising on the Web
Recommendation Systems Mining SocialNetwork Graphs Finding Similar Items Similar Web Pages Collaborative Filtering
Mining Data Streams Sensor Data Images Data Internet and Web Traffic Frequent Item Sets
Related Concepts Clustering Recommendation Systems Content-based Systems Collaborative Filtering Systems
Recommendation Systems The Utility Matrix LOTR1 A 4 B 5
LOTR2 E BB TWD 1 5
C D LOTR3 TW1 TW2 TW3
5 4 2 1 5 3
3 5 2 Recommendation Systems Applications Product Recommendations
Movie Recommendations News Articles Recommendation Systems Populating the Utility Matrix Ask the User
Inferences from Users Behavior Mining Social Network Graphs Social Networks Social Networks Email Networks
Telephone Networks Making it Real Putting it all together: Pig, Hive, Sqoop, SQL Azure, PowerPivot, PowerView Brian Mitchell Data Mining R and SQL Server
R Programing Language R includes an effective data handling and storage facility a suite of operators for calculations on arrays, in particular matrices a large, coherent, integrated collection of intermediate tools for data analysis graphical facilities for data analysis and display either on-screen or on hardcopy conditionals, loops, user-defined recursive functions
and input and output facilities. R Graphing Capabilities R & SQL Server Brian Mitchell Social Network NodeXL Analysis
NodeXL With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window. http:// nodexl.codeplex.com/r eleases/view/104762
NodeXL Brian Mitchell Predictive Analytics Predictive Analytics The alternative to thinking ahead would be to think backwards . . . and thats just remembering
Dr. Sheldon Cooper The Big Bang Theory Analytics Maturity Performanc e Analytics Predictive Analytics Prescriptive
Analytics Big Data Solutions drives Analytics Solutions Maturity 2013 PROS, Inc. All rights Traditional DW/BI Environment ETL Data Warehouse
Transactional OLAP Backroom/Data Warehouse Reportin g Tomorrows DW/BI Environment
ETL Transactional New Data Sources Social Networks Sensor Data Log Data
RFID Data Automated Data Data Warehouse HDInsight Business Critical OLAP
Backroom/Data Warehouse Reportin g Related Sessions on Big Data DBI-B339 Predictive Analytics with Microsoft Big Data Val Fontama and Saptak Sen Tuesday June 4 @ 3:15PM
DBI-B401 Enriching Big Data for Analysis Adam Jorgensen and Lara Rubbelke Thursday June 6 @ 8:30AM DBI-B334 Data Management in Microsoft HDInsight: How to Move and Store Your Data Mike Flasko Thursday June 6 @ 1:00PM Questions? Track Resources
Download Data Explorer Windows Azure Download Geoflow SQL Server Website
mva Microsoft Virtual Academy Hands-On Labs Get Certified! @sqlserver
Resources Learning Sessions on Demand http://channel9.msdn.com/Events/TechEd TechNet Resources for IT Professionals http://microsoft.com/technet
Microsoft Certification & Training Resources www.microsoft.com/learning msdn Resources for Developers http://microsoft.com/msdn Complete an evaluation on CommNet and enter to win!
Evaluate this session Scan this QR code to evaluate this session and be automatically entered in a drawing to 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Tales from Gold Mountain (Paul Yee, 1989) Anchored in the railway-building era, these novels bring the reader into the twentieth century and toward a flowering of Asian writing, especially that of the multi-generation family saga often charting the family's identity...
Problem and Solution. Problem and solution. Here is the . problem. After the fire, thousands of people were left homeless. Many escaped the fire with nothing except the clothes on their backs. Providing all of these people with food, clean...
Hypoxemia results from . An inadequate O. 2. in the airDisease of the respiratory system. ... Increased arterial PCO. 2 Caused by hypoventilation or mismatching of ventilation and perfusion. Effects. Acid-base balance (decreased pH, respiratory acidosis)
A linking verb connects the subject of a sentence with a word in the predicate. It makes a statement. It does not show action. Ex. The girl is cold. What are some linking verbs? Most common linking verbs: Be (is,...
Sparse matrix. An mXn matrix is said to be sparse if "many" of its elements are zero. A matrix that is not sparse is called a . dense matrix. We can device a simple representation scheme whose space requirement equals...
Unified Code for Units of Measure (UCUM) Gunther Schadow, MD, PhD, Investigator, Regenstrief Institute Associate Professor, Indiana University School of Informatics Co-chair, Orders and Observations, Health Level-7 President, Pragmatic Data LLC
Three Dimensional Computational Model of Water Movement in Plant Root Growth Zone Brandy Wiegers1,2, Dr. Angela Cheer2, Dr. Wendy Silk3 1 [email protected] 2 Department of Mathematics, University of California, Davis 3 Department of Land, Air, and Water Resources
Ready to download the document? Go ahead and hit continue!