Expert Hadoop Administration
Home > Computing and Information Technology > Databases > Data mining > Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS
Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

|
     0     
5
4
3
2
1




Out of Stock


Notify me when this book is in stock
About the Book

The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size.” –Paul Dix, Series Editor In Expert Hadoop® Administration, leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop’s architecture from an administrator’s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop

Table of Contents:
Foreword xxvii Preface xxix Acknowledgments xxxv About the Author xxxvii   Part I: Introduction to Hadoop—Architecture and Hadoop Clusters 1   Chapter 1: Introduction to Hadoop and Its Environment 3 Hadoop—An Introduction 4 Cluster Computing and Hadoop Clusters 12 Hadoop Components and the Hadoop Ecosphere 15 What Do Hadoop Administrators Do? 18 Key Differences between Hadoop 1 and Hadoop 2 21 Distributed Data Processing: MapReduce and Spark, Hive and Pig 24 Data Integration: Apache Sqoop, Apache Flume and Apache Kafka 27 Key Areas of Hadoop Administration 28 Summary 31   Chapter 2: An Introduction to the Architecture of Hadoop 33 Distributed Computing and Hadoop 33 Hadoop Architecture 34 Data Storage—The Hadoop Distributed File System 37 Data Processing with YARN, the Hadoop Operating System 48 Summary 57   Chapter 3: Creating and Configuring a Simple Hadoop Cluster 59 Hadoop Distributions and Installation Types 60 Setting Up a Pseudo-Distributed Hadoop Cluster 62 Performing the Initial Hadoop Configuration 71 Operating the New Hadoop Cluster 86 Summary 90   Chapter 4: Planning for and Creating a Fully Distributed Cluster 91 Planning Your Hadoop Cluster 92 Going from a Single Rack to Multiple Racks 95 Creating a Multinode Cluster 102 Modifying the Hadoop Configuration 106 Starting Up the Cluster 114 Configuring Hadoop Services, Web Interfaces and Ports 119 Summary 126   Part II: Hadoop Application Frameworks 127   Chapter 5: Running Applications in a Cluster—The MapReduce Framework (and Hive and Pig) 129 The MapReduce Framework 129 Apache Hive 141 Apache Pig 144 Summary 145   Chapter 6: Running Applications in a Cluster—The Spark Framework 147 What Is Spark? 148 Why Spark? 149 The Spark Stack 153 Installing Spark 155 Spark Run Modes 158 Understanding the Cluster Managers 159 Spark and Data Access 164 Summary 167   Chapter 7: Running Spark Applications 169 The Spark Programming Model 169 Spark Applications 173 Architecture of a Spark Application 179 Running Spark Applications Interactively 181 Creating and Submitting Spark Applications 185 Configuring Spark Applications 192 Monitoring Spark Applications 194 Handling Streaming Data with Spark Streaming 194 Using Spark SQL for Handling Structured Data 198 Summary 201   Part III: Managing and Protecting Hadoop Data and High Availability 203   Chapter 8: The Role of the NameNode and How HDFS Works 205 HDFS—The Interaction between the NameNode and the DataNodes 205 Rack Awareness and Topology 209 HDFS Data Replication 212 How Clients Read and Write HDFS Data 218 Understanding HDFS Recovery Processes 224 Centralized Cache Management in HDFS 227 Hadoop Archival Storage, SSD and Memory (Heterogeneous Storage) 232 Summary 241   Chapter 9: HDFS Commands, HDFS Permissions and HDFS Storage 243 Managing HDFS through the HDFS Shell Commands 243 Using the dfsadmin Utility to Perform HDFS Operations 251 Managing HDFS Permissions and Users 255 Managing HDFS Storage 260 Rebalancing HDFS Data 267 Reclaiming HDFS Space 274 Summary 276   Chapter 10: Data Protection, File Formats and Accessing HDFS 277 Safeguarding Data 278 Data Compression 289 Hadoop File Formats 295 Using Hadoop WebHDFS and HttpFS 308 Summary 315   Chapter 11: NameNode Operations, High Availability and Federation 317 Understanding NameNode Operations 318 The Checkpointing Process 323 NameNode Safe Mode Operations 329 Configuring HDFS High Availability 334 HDFS Federation 349 Summary 351   Part IV: Moving Data, Allocating Resources, Scheduling Jobs and Security 353   Chapter 12: Moving Data Into and Out of Hadoop 355 Introduction to Hadoop Data Transfer Tools 355 Loading Data into HDFS from the Command Line 356 Copying HDFS Data between Clusters with DistCp 361 Ingesting Data from Relational Databases with Sqoop 365 Ingesting Data from External Sources with Flume 388 Ingesting Data with Kafka 398 Summary 406   Chapter 13: Resource Allocation in a Hadoop Cluster 407 Resource Allocation in Hadoop 407 The FIFO Scheduler 410 The Capacity Scheduler 411 The Fair Scheduler 426 Comparing the Capacity Scheduler and the Fair Scheduler 435 Summary 436   Chapter 14: Working with Oozie to Manage Job Workflows 437 Using Apache Oozie to Schedule Jobs 437 Oozie Architecture 439 Deploying Oozie in Your Cluster 441 Understanding Oozie Workflows 446 How Oozie Runs an Action 449 Creating an Oozie Workflow 454 Running an Oozie Workflow Job 461 Oozie Coordinators 464 Managing and Administering Oozie 470 Summary 475   Chapter 15: Securing Hadoop 477 Hadoop Security—An Overview 478 Hadoop Authentication with Kerberos 481 Hadoop Authorization 505 Auditing Hadoop 518 Securing Hadoop Data 520 Other Hadoop-Related Security Initiatives 524 Summary 525   Part V: Monitoring, Optimization and Troubleshooting 527   Chapter 16: Managing Jobs, Using Hue and Performing Routine Tasks 529 Using the YARN Commands to Manage Hadoop Jobs 530 Decommissioning and Recommissioning Nodes 535 ResourceManager High Availability 541 Performing Common Management Tasks 545 Managing the MySQL Database 548 Backing Up Important Cluster Data 551 Using Hue to Administer Your Cluster 553 Implementing Specialized HDFS Features 562 Summary 567   Chapter 17: Monitoring, Metrics and Hadoop Logging 569 Monitoring Linux Servers 570 Hadoop Metrics 576 Using Ganglia for Monitoring 579 Understanding Hadoop Logging 582 Using Hadoop’s Web UIs for Monitoring 599 Monitoring Other Hadoop Components 609 Summary 610   Chapter 18: Tuning the Cluster Resources, Optimizing MapReduce Jobs and Benchmarking 611 How to Allocate YARN Memory and CPU 612 Configuring Efficient Performance 621 Tuning Map and Reduce Tasks—What the Administrator Can Do 625 Optimizing Pig and Hive Jobs 635 Benchmarking Your Cluster 638 Hadoop Counters 647 Optimizing MapReduce 652 Summary 658   Chapter 19: Configuring and Tuning Apache Spark on YARN 659 Configuring Resource Allocation for Spark on YARN 659 Dynamic Resource Allocation when Running Spark on YARN 676 Storage Formats and Compressing Data 678 Monitoring Spark Applications 681 Tuning Garbage Collection 686 Tuning Spark Streaming Applications 688 Summary 689   Chapter 20: Optimizing Spark Applications 691 Revisiting the Spark Execution Model 692 Shuffle Operations and How to Minimize Them 694 Partitioning and Parallelism (Number of Tasks) 703 Optimizing Data Serialization and Compression 710 Understanding Spark’s SQL Query Optimizer 712 Caching Data 717 Summary 723   Chapter 21: Troubleshooting Hadoop—A Sampler 725 Space-Related Issues 725 Handling YARN Jobs That Are Stuck 731 JVM Memory-Allocation and Garbage-Collection Strategies 732 Handling Different Types of Failures 737 Troubleshooting Spark Jobs 739 Debugging Spark Applications 740 Summary 742   Chapter 22: Installing VirtualBox and Linux and Cloning the Virtual Machines 743 Installing Oracle VirtualBox 744 Installing Oracle Enterprise Linux 745 Cloning the Linux Server 745   Index 747


Best Sellers


Product Details
  • ISBN-13: 9780134598154
  • Publisher: Pearson Education (US)
  • Publisher Imprint: Addison Wesley
  • Language: English
  • Sub Title: Managing, Tuning, and Securing Spark, YARN, and HDFS
  • ISBN-10: 0134598156
  • Publisher Date: 27 Apr 2021
  • Binding: Digital download
  • No of Pages: 848


Similar Products

Add Photo
Add Photo

Customer Reviews

REVIEWS      0     
Click Here To Be The First to Review this Product
Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS
Pearson Education (US) -
Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals

    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!