Practical Data Science with Hadoop and Spark
Home > Computing and Information Technology > Databases > Data mining > Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale
Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

|
     0     
5
4
3
2
1




International Edition


About the Book

The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. Practical Data Science with Hadoop® and Spark is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials. The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization. Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP). This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives. Learn What data science is, how it has evolved, and how to plan a data science career How data volume, variety, and velocity shape data science use cases Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark Data importation with Hive and Spark Data quality, preprocessing, preparation, and modeling Visualization: surfacing insights from huge data sets Machine learning: classification, regression, clustering, and anomaly detection Algorithms and Hadoop tools for predictive modeling Cluster analysis and similarity functions Large-scale anomaly detection NLP: applying data science to human language

Table of Contents:
Foreword xiii Preface xv Acknowledgments xxi About the Authors xxiii Part I: Data Science with Hadoop—An Overview 1 Chapter 1: Introduction to Data Science 3 What Is Data Science? 3 Example: Search Advertising 4 A Bit of Data Science History 5 Becoming a Data Scientist 8 Building a Data Science Team 12 The Data Science Project Life Cycle 13 Managing a Data Science Project 18 Summary 18 Chapter 2: Use Cases for Data Science 19 Big Data—A Driver of Change 19 Business Use Cases 21 Summary 29 Chapter 3: Hadoop and Data Science 31 What Is Hadoop? 31 Hadoop’s Evolution 37 Hadoop Tools for Data Science 38 Why Hadoop Is Useful to Data Scientists 46 Summary 51 Part II: Preparing and Visualizing Data with Hadoop 53 Chapter 4: Getting Data into Hadoop 55 Hadoop as a Data Lake 56 The Hadoop Distributed File System (HDFS) 58 Direct File Transfer to Hadoop HDFS 58 Importing Data from Files into Hive Tables 59 Importing Data into Hive Tables Using Spark 62 Using Apache Sqoop to Acquire Relational Data 65 Using Apache Flume to Acquire Data Streams 74 Manage Hadoop Work and Data Flows with Apache Oozie 79 Apache Falcon 81 What’s Next in Data Ingestion? 82 Summary 82 Chapter 5: Data Munging with Hadoop 85 Why Hadoop for Data Munging? 86 Data Quality 86 The Feature Matrix 93 Summary 106 Chapter 6: Exploring and Visualizing Data 107 Why Visualize Data? 107 Creating Visualizations 112 Using Visualization for Data Science 121 Popular Visualization Tools 121 Visualizing Big Data with Hadoop 123 Summary 124 Part III: Applying Data Modeling with Hadoop 125 Chapter 7: Machine Learning with Hadoop 127 Overview of Machine Learning 127 Terminology 128 Task Types in Machine Learning 129 Big Data and Machine Learning 130 Tools for Machine Learning 131 The Future of Machine Learning and Artificial Intelligence 132 Summary 132 Chapter 8: Predictive Modeling 133 Overview of Predictive Modeling 133 Classification Versus Regression 134 Evaluating Predictive Models 136 Supervised Learning Algorithms 140 Building Big Data Predictive Model Solutions 141 Example: Sentiment Analysis 145 Summary 150 Chapter 9: Clustering 151 Overview of Clustering 151 Uses of Clustering 152 Designing a Similarity Measure 153 Clustering Algorithms 154 Example: Clustering Algorithms 155 Evaluating the Clusters and Choosing the Number of Clusters 157 Building Big Data Clustering Solutions 158 Example: Topic Modeling with Latent Dirichlet Allocation 160 Summary 163 Chapter 10: Anomaly Detection with Hadoop 165 Overview 165 Uses of Anomaly Detection 166 Types of Anomalies in Data 166 Approaches to Anomaly Detection 167 Tuning Anomaly Detection Systems 170 Building a Big Data Anomaly Detection Solution with Hadoop 171 Example: Detecting Network Intrusions 172 Summary 179 Chapter 11: Natural Language Processing 181 Natural Language Processing 181 Tooling for NLP in Hadoop 184 Textual Representations 187 Sentiment Analysis Example 189 Summary 193 Chapter 12: Data Science with Hadoop—The Next Frontier 195 Automated Data Discovery 195 Deep Learning 197 Summary 199 Appendix A: Book Web Page and Code Download 201 Appendix B: HDFS Quick Start 203 Quick Command Dereference 204 Appendix C: Additional Background on Data Science and Apache Hadoop and Spark 209 General Hadoop/Spark Information 209 Hadoop/Spark Installation Recipes 210 HDFS 210 MapReduce 211 Spark 211 Essential Tools 211 Machine Learning 212 Index 213


Best Sellers


Product Details
  • ISBN-13: 9780134024141
  • Publisher: Pearson Education (US)
  • Publisher Imprint: Addison Wesley
  • Height: 234 mm
  • No of Pages: 256
  • Spine Width: 16 mm
  • Weight: 458 gr
  • ISBN-10: 0134024141
  • Publisher Date: 06 Feb 2017
  • Binding: Paperback
  • Language: English
  • Returnable: Y
  • Sub Title: Designing and Building Effective Analytics at Scale
  • Width: 179 mm


Similar Products

Add Photo
Add Photo

Customer Reviews

REVIEWS      0     
Click Here To Be The First to Review this Product
Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale
Pearson Education (US) -
Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals

    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!