About the Book
        
        Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours
 
 In just 24 lessons of one hour or less, Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours helps you leverage Hadoop’s power on a flexible, scalable cloud platform using Microsoft’s newest business intelligence, visualization, and productivity tools.
 
 This book’s straightforward, step-by-step approach shows you how to provision, configure, monitor, and troubleshoot HDInsight and use Hadoop cloud services to solve real analytics problems. You’ll gain more of Hadoop’s benefits, with less complexity–even if you’re completely new to Big Data analytics. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success.
 
 Practical, hands-on examples show you how to apply what you learn
 Quizzes and exercises help you test your knowledge and stretch your skills
 Notes and tips point out shortcuts and solutions
  
 Learn how to… 
 ·         Master core Big Data and NoSQL concepts, value propositions, and use cases
 ·         Work with key Hadoop features, such as HDFS2 and YARN
 ·         Quickly install, configure, and monitor Hadoop (HDInsight) clusters in the cloud
 ·         Automate provisioning, customize clusters, install additional Hadoop projects, and administer clusters 
 ·         Integrate, analyze, and report with Microsoft BI and Power BI
 ·         Automate workflows for data transformation, integration, and other tasks
 ·         Use Apache HBase on HDInsight
 ·         Use Sqoop or SSIS to move data to or from HDInsight
 ·         Perform R-based statistical computing on HDInsight datasets
 ·         Accelerate analytics with Apache Spark
 ·         Run real-time analytics on high-velocity data streams
 ·         Write MapReduce, Hive, and Pig programs
  
 
 Register your book at informit.com/register for convenient access to downloads, updates, and corrections as they become available. 
Table of Contents: 
Introduction
 Part I: Understanding Big Data, Hadoop 1.0, and 2.0
 Hour 1: Introduction of Big Data, NoSQL, and Business Value Proposition
 Types of Analysis
 Types of Data
 Big Data
 Managing Big Data
 NoSQL Systems
 Big Data, NoSQL Systems, and the Business Value Proposition
 Application of Big Data and Big Data Solutions
 Summary
 Q&A
 Hour 2: Introduction to Hadoop, Its Architecture, Ecosystem, and Microsoft Offerings
 What Is Apache Hadoop?
 Architecture of Hadoop and Hadoop Ecosystems
 What’s New in Hadoop 2.0
 Architecture of Hadoop 2.0
 Tools and Technologies Needed with Big Data Analytics
 Major Players and Vendors for Hadoop
 Deployment Options for Microsoft Big Data Solutions
 Summary
 Q&A
 Hour 3: Hadoop Distributed File System Versions 1.0 and 2.0
 Introduction to HDFS
 HDFS Architecture
 Rack Awareness
 WebHDFS
 Accessing and Managing HDFS Data
 What’s New in HDFS 2.0
 Summary
 Q&A
 Hour 4: The MapReduce Job Framework and Job Execution Pipeline
 Introduction to MapReduce
 MapReduce Architecture
 MapReduce Job Execution Flow
 Summary
 Q&A
 Hour 5: MapReduce–Advanced Concepts and YARN 
 DistributedCache
 Hadoop Streaming
 MapReduce Joins
 Bloom Filter
 Performance Improvement
 Handling Failures
 Counter
 YARN
 Uber-Tasking Optimization
 Failures in YARN
 Resource Manager High Availability and Automatic Failover in YARN
 Summary
 Q&A
 Part II: Getting Started with HDInsight and Understanding Its Different Components
 Hour 6: Getting Started with HDInsight, Provisioning Your HDInsight Service Cluster, and Automating HDInsight Cluster Provisioning
 Introduction to Microsoft Azure
 Understanding HDInsight Service
 Provisioning HDInsight on the Azure Management Portal
 Automating HDInsight Provisioning with PowerShell
 Managing and Monitoring HDInsight Cluster and Job Execution
 Summary
 Q&A
 Exercise
 Hour 7: Exploring Typical Components of HDFS Cluster 
 HDFS Cluster Components
 HDInsight Cluster Architecture
 High Availability in HDInsight
 Summary
 Q&A
 Hour 8: Storing Data in Microsoft Azure Storage Blob 
 Understanding Storage in Microsoft Azure
 Benefits of Azure Storage Blob over HDFS
 Azure Storage Explorer Tools
 Summary
 Q&A
 Hour 9: Working with Microsoft Azure HDInsight Emulator 
 Getting Started with HDInsight Emulator
 Setting Up Microsoft Azure Emulator for Storage
 Summary
 Q&A
 Part III: Programming MapReduce and HDInsight Script Action
 Hour 10: Programming MapReduce Jobs 
 MapReduce Hello World!
 Analyzing Flight Delays with MapReduce
 Serialization Frameworks for Hadoop
 Hadoop Streaming
 Summary
 Q&A
 Hour 11: Customizing the HDInsight Cluster with Script Action
 Identifying the Need for Cluster Customization
 Developing Script Action
 Consuming Script Action
 Running a Giraph job on a Customized HDInsight Cluster
 Testing Script Action with HDInsight Emulator
 Summary
 Q&A
 Part IV: Querying and Processing Big Data in HDInsight
 Hour 12: Getting Started with Apache Hive and Apache Tez in HDInsight
 Introduction to Apache Hive
 Getting Started with Apache Hive in HDInsight
 Azure HDInsight Tools for Visual Studio
 Programmatically Using the HDInsight .NET SDK
 Introduction to Apache Tez
 Summary
 Q&A
 Exercise
 Hour 13: Programming with Apache Hive, Apache Tez in HDInsight, and Apache HCatalog 
 Programming with Hive in HDInsight
 Using Tables in Hive
 Serialization and Deserialization
 Data Load Processes for Hive Tables
 Querying Data from Hive Tables
 Indexing in Hive
 Apache Tez in Action
 Apache HCatalog
 Summary
 Q&A
 Exercise
 Hour 14: Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 1
 Introduction to Hive ODBC Driver
 Introduction to Microsoft Power BI
 Accessing Hive Data from Microsoft Excel
 Summary
 Q&A
 Hour 15: Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 2
 Accessing Hive Data from PowerPivot
 Accessing Hive Data from SQL Server
 Accessing HDInsight Data from Power Query
 Summary
 Q&A
 Exercise
 Hour 16: Integrating HDInsight with SQL Server Integration Services 
 The Need for Data Movement
 Introduction to SSIS
 Analyzing On-time Flight Departure with SSIS
 Provisioning HDInsight Cluster
 Summary
 Q&A
 Hour 17: Using Pig for Data Processing 
 Introduction to Pig Latin
 Using Pig to Count Cancelled Flights
 Using HCatalog in a Pig Latin Script
 Submitting Pig Jobs with PowerShell
 Summary
 Q&A
 Hour 18: Using Sqoop for Data Movement Between RDBMS and HDInsight
 What Is Sqoop?
 Using Sqoop Import and Export Commands
 Using Sqoop with PowerShell
 Summary
 Q&A
 Part V: Managing Workflow and Performing Statistical Computing
 Hour 19: Using Oozie Workflows and Job Orchestration with HDInsight 
 Introduction to Oozie
 Determining On-time Flight Departure Percentage with Oozie
 Submitting an Oozie Workflow with HDInsight .NET SDK
 Coordinating Workflows with Oozie
 Oozie Compared to SSIS
 Summary
 Q&A
 Hour 20: Performing Statistical Computing with R 
 Introduction to R
 Integrating R with Hadoop
 Enabling R on HDInsight
 Summary
 Q&A
 Part VI: Performing Interactive Analytics and Machine Learning
 Hour 21: Performing Big Data Analytics with Spark 
 Introduction to Spark
 Spark Programming Model
 Blending SQL Querying with Functional Programs
 Summary
 Q&A
 Hour 22: Microsoft Azure Machine Learning 
 History of Traditional Machine Learning
 Introduction to Azure ML
 Azure ML Workspace
 Processes to Build Azure ML Solutions
 Getting Started with Azure ML
 Creating Predictive Models with Azure ML
 Publishing Azure ML Models as Web Services
 Summary
 Q&A
 Exercise
 Part VII: Performing Real-time Analytics
 Hour 23: Performing Stream Analytics with Storm
 Introduction to Storm
 Using SCP.NET to Develop Storm Solutions
 Analyzing Speed Limit Violation Incidents with Storm
 Summary
 Q&A
 Hour 24: Introduction to Apache HBase on HDInsight 
 Introduction to Apache HBase
 HBase Architecture
 Creating HDInsight Cluster with HBase
 Summary
 Q&A
  
 9780672337277   TOC   10/26/2015
     
About the Author : 
Arshad Ali has more than 13 years of experience in the computer industry. As a DB/DW/BI consultant in an end-to-end delivery role, he has been working on several enterprise-scale data warehousing and analytics projects for enabling and developing business intelligence and analytic solutions. He specializes in database, data warehousing, and business intelligence/analytics application design, development, and deployment at the enterprise level. He frequently works with SQL Server, Microsoft Analytics Platform System (APS, or formally known as SQL Server Parallel Data Warehouse [PDW]), HDInsight (Hadoop, Hive, Pig, HBase, and so on), SSIS, SSRS, SSAS, Service Broker, MDS, DQS, SharePoint, and PPS. In the past, he has also handled performance optimization for several projects, with significant performance gain. 
 Arshad is a Microsoft Certified Solutions Expert (MCSE)–SQL Server 2012 Data Platform, and Microsoft Certified IT Professional (MCITP) in Microsoft SQL Server 2008–Database Development, Data Administration, and Business Intelligence. He is also certified on ITIL 2011 foundation.
 
 He has worked in developing applications in VB, ASP, .NET, ASP.NET, and C#. He is a Microsoft Certified Application Developer (MCAD) and Microsoft Certified Solution Developer (MCSD) for the .NET platform in Web, Windows, and Enterprise.
 
 Arshad has presented at several technical events and has written more than 200 articles related to DB, DW, BI, and BA technologies, best practices, processes, and performance optimization techniques on SQL Server, Hadoop, and related technologies. His articles have been published on several prominent sites.
 
 On the educational front, Arshad holds a Master in Computer Applications degree and a Master in Business Administration in IT degree.
 
 Arshad can be reached at arshad.ali@live.in, or visit http://arshadali.blogspot.in/ to connect with him.
  
 Manpreet Singh is a consultant and author with extensive expertise in architecture, design, and implementation of business intelligence and Big Data analytics solutions. He is passionate about enabling businesses to derive valuable insights from their data.
 
 Manpreet has been working on Microsoft technologies for more than 8 years, with a strong focus on Microsoft Business Intelligence Stack, SharePoint BI, and Microsoft’s Big Data Analytics Platforms (Analytics Platform System and HDInsight). He also specializes in Mobile Business Intelligence solution development and has helped businesses deliver a consolidated view of their data to their mobile workforces.
 
 Manpreet has coauthored books and technical articles on Microsoft technologies, focusing on the development of data analytics and visualization solutions with the Microsoft BI Stack and SharePoint. He holds a degree in computer science and engineering from Panjab University, India.
 
 
 Manpreet can be reached at manpreet.singh3@hotmail.com.