Virtualizing Hadoop
Book 1
Book 2
Book 3
Book 1
Book 2
Book 3
Book 1
Book 2
Book 3
Book 1
Book 2
Book 3
Home > Computing and Information Technology > Databases > Data mining > Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)


     0     
5
4
3
2
1



Out of Stock


Notify me when this book is in stock
X
About the Book

Plan and Implement Hadoop Virtualization for Maximum Performance, Scalability, and Business Agility   Enterprises running Hadoop must absorb rapid changes in big data ecosystems, frameworks, products, and workloads. Virtualized approaches can offer important advantages in speed, flexibility, and elasticity. Now, a world-class team of enterprise virtualization and big data experts guide you through the choices, considerations, and tradeoffs surrounding Hadoop virtualization. The authors help you decide whether to virtualize Hadoop, deploy Hadoop in the cloud, or integrate conventional and virtualized approaches in a blended solution.   First, Virtualizing Hadoop reviews big data and Hadoop from the standpoint of the virtualization specialist. The authors demystify MapReduce, YARN, and HDFS and guide you through each stage of Hadoop data management. Next, they turn the tables, introducing big data experts to modern virtualization concepts and best practices.   Finally, they bring Hadoop and virtualization together, guiding you through the decisions you’ll face in planning, deploying, provisioning, and managing virtualized Hadoop. From security to multitenancy to day-to-day management, you’ll find reliable answers for choosing your best Hadoop strategy and executing it.   Coverage includes the following:           •        Reviewing the frameworks, products, distributions, use cases, and roles associated with Hadoop           •        Understanding YARN resource management, HDFS storage, and I/O           •        Designing data ingestion, movement, and organization for modern enterprise data platforms           •        Defining SQL engine strategies to meet strict SLAs           •        Considering security, data isolation, and scheduling for multitenant environments           •        Deploying Hadoop as a service in the cloud           •        Reviewing the essential concepts, capabilities, and terminology of virtualization            •        Applying current best practices, guidelines, and key metrics for Hadoop virtualization           •        Managing multiple Hadoop frameworks and products as one unified system           •        Virtualizing master and worker nodes to maximize availability and performance           •        Installing and configuring Linux for a Hadoop environment  

Table of Contents:
Foreword xix Preface xxi Part I: Introduction to Hadoop Chapter 1 Understanding the Big Data World 1 The Data Revolution 2 Traditional Data Systems 4     Semi-Structured and Unstructured Data 5     Causation and Correlation 7     Data Challenges 8 The Modern Data Architecture 17 Organizational Transformations 20 Industry Transformation 21 Summary 22 Chapter 2 Hadoop Fundamental Concepts 23 Types of Data in Hadoop 23 Use Cases 25 What Is Hadoop? 26 Hadoop Distributions 32 Hadoop Frameworks 32 NoSQL Databases 37     What Is NoSQL? 38 A Hadoop Cluster 42 Hadoop Software Processes 45     Hadoop Hardware Profiles 48 Roles in the Hadoop Environment 56 Summary 59 Chapter 3 YARN and HDFS 61 A Hadoop Cluster Is Distributed 61 Hadoop Directory Layouts 65     Hadoop Operating System Users 67 The Hadoop Distributed File System 67     YARN Logging 70     The NameNode 70     The DataNode 71     Block Placement 75     NameNode Configurations and Managing Metadata 77 Rack Awareness 82     Block Management 83     The Balancer 84     Maintaining Data Integrity in the Cluster 84 Quotas and Trash 92 YARN and the YARN Processing Model 93     Running Applications on YARN 101     Resource Schedulers 107     Benchmarking 112     TeraSort Benchmarking Suite 115 Summary 117 Chapter 4 The Modern Data Platform 119 Designing a Hadoop Cluster 119     Enterprise Data Movement 124 Summary 140 Chapter 5 Data Ingestion 141 Extraction, Loading, and Transformation (ELT) 141     Sqoop: Data Movement with SQL Sources 143     Flume: Streaming Data 148     Oozie: Scheduling and Workfl ow 167     Falcon: Data Lifecycle Management 172     Kafka: Real-time Data Streaming 176 Summary 186 Chapter 6 Hadoop SQL Engines 187 Where SQL Was Born 187 SQL in Hadoop 188 Hadoop SQL Engines 190     Selecting the SQL Tool For Hadoop 190 Now Getting Groovy with Hive and Pig 198     Hive 199     HCatalog 213     Pig 215 Summary 221 Chapter 7 Multitenancy in Hadoop 223 Securing the Access 224     Authentication 225     Auditing 230     Authorization 230     Data Protection 232     Isolating the Data 241     Isolating the Process 251 Summary 255 Part II: Introduction to Virtualization Chapter 8 Virtualization Fundamentals 257 Why Virtualize Hadoop? 258     Introduction to Virtualization 261 Summary 276 References 276 Chapter 9 Best Practices for Virtualizing Hadoop 277 Running Virtualized Hadoop with Purpose and Discipline 277     The Discipline of Purpose Starts with a Clear Target 279     Virtualizing Different Tiers of Hadoop 280     Industry Best Practices 282 Summary 298 Part III: Virtualizing Hadoop Chapter 10 Virtualizing Hadoop 299 How Are Hadoop Ecosystems Going to Be Managed? 300     Building an Enterprise Hadoop Platform That Is Agile and Flexible 301     Clarification of Terms 302     The Journey from Bare-Metal to Virtualization 303 Why Consider Virtualizing Hadoop? 304     Benefits of Virtualizing Hadoop 305     Virtualized Hadoop Can Run as Fast or Faster Than Native 306     Coordination and Cross-Purpose Specialization Is the Future 309     Barriers Can Be Organizational 310     Virtualization Is Not an All or Nothing Option 310     Rapid Provisioning and Improving Quality of Development and Test Environments 311     Improve High Availability with Virtualization 313     Use Virtualization to Leverage Hadoop Workloads 313     Hadoop in the Cloud 314     Big Data Extensions 314     The Path to Virtualization 315     The Software-Defined Data Center 316     Virtualizing the Network 318     vRealize Suite 320 Summary 321 References 322 Chapter 11 Virtualizing Hadoop Master Servers 323 Virtualizing Servers in a Hadoop Cluster 324     Virtualizing the Environment Around Hadoop 325     Virtualizing the Master Hadoop Servers 325     Virtualizing Without the SAN 330 Summary 331 Chapter 12 Virtualizing the Hadoop Worker Nodes 333 A Brief Introduction to the Worker Nodes in Hadoop 333 Deployment Models for Hadoop Clusters 335     The Combined Model 336     The Separated Model 339     Network Effects of the Data-Compute Separation 341     The Shared-Storage Approach to the Data-Compute Separated Model 343     Local Disks for the Application’s Temporary Data 345     The Shared Storage Architecture Model Using Network-Attached Storage (NAS) 345     Deployment Model Summary 348 Best Practices for Virtualizing Hadoop Workers 349     Disk I/O 349 The Hadoop Virtualization Extensions (HVE) 354 Summary 357 References 358 Resources 358 Chapter 13 Deploying Hadoop as a Service in the Private Cloud 361 The Cloud Context 361     Stakeholders for Hadoop 362     Overview of the Solution Architecture 368 Summary 370 References 371 Chapter 14 Understanding the Installation of Hadoop 373 Map the Right Solutions to the Right Use Case 373     Thoughts About Installing Hadoop 374 Configuring Repositories 376     Installing HDP 2.2 378     Environment Preparation 378 Setting Up the Hadoop Configuration 389 Starting HDFS and YARN 393     Start YARN 396     Verifying MapReduce Functionality 398 Installing and Configuring Hive 400 Installing and Configuring MySQL Database 401 Installing and Configuring Hive and HCatalog 401 Summary 404 Chapter 15 Configuring Linux for Hadoop 405 Supported Linux Platforms 406 Different Deployment Models 406 Linux Golden Templates 407     Building a Linux Enterprise Hadoop Platform 408     Selecting the Linux Distribution 411 Optimal Linux Kernel Parameters and System Settings 411     epoll 411     Disable Swap Space 412     Disable Security During Install 412     IO Scheduler Tuning 414     Check Transparent Huge Pages Configuration 414     Limits.conf 414     Partition Alignment for RDMs 415     File System Considerations 416     Lazy Count Parameter for XFS 418     Mount Options 418     I/O Scheduler 419     Disk Read and Write Options 421     Storage Benchmarking 421     Java Version 422     Set Up NTP 423     Enable Jumbo Frames 424     Additional Network Considerations 425 Summary 427 Appendix A Hadoop Cluster Creation: A Prerequisite Checklist 429 Appendix B Big Data/Hadoop on VMware vSphere Reference Materials 433 Deployment Guides 433 Reference Architectures 434 Customer Case Studies 434 Performance 434 vSphere Big Data Extensions (BDE) 435 Other vSphere Features and Big Data 436     9780133811025   TOC   7/7/2015  

About the Author :
George J. Trujillo, Jr. is an experienced corporate executive with exceptional communication skills. He is an expert in change management with strong leadership skills, critical thinking, and data-driven decisions. George is an internationally recognized data architect, leader, and speaker in big data and cloud solutions. His background includes Big Data Architecture, Hadoop (Hortonworks, Cloudera), data governance, schema design, metadata management, security, NoSQL, and BI. He has many industry recognitions, including Oracle Recognized Double ACE, Sun Ambassador for Sun Microsystem’s Application Middleware Platform, VMware Recognized vExpert, VMware Certified Instructor, MySQL’s Socrates Award, and MySQL Certified DBA. His leadership in the user community includes Independent Oracle Users Group (IOUG) board of directors, president of IOUG Cloud SIG, chair for RMOUG Big Data SIG, president of RMOUG Cloud SIG, Oracle Fusion Council and Oracle Beta Leadership Council, IOUG’s Elected to “Oracles of Oracle” circle, and master presenter for the IOUG’s Master Series. His many job positions have included vice president of big data architecture in the financial services industry, master principal big data specialist at Hortonworks, tier one data specialist for VMware Center of Excellence, and CEO for professional services and training organization.   Charles Kim is the president of Viscosity North America, a niche consulting organization specializing in big data, Oracle Exadata/RAC, and virtualization. Charles is an architect in Hadoop/big data, Linux infrastructure, cloud, virtualization, engineered systems, and Oracle clustering technologies. Charles is an author with Oracle Press, Pearson, and APress in Oracle, Hadoop, and Linux technology stacks. He holds certifications in Oracle, VMware, Red Hat Linux, and Microsoft and has more than 23 years of IT experience on mission- and business-critical systems. Charles presents regularly at VMworld, Oracle OpenWorld, IOUG, and various local/regional user group conferences. He is an Oracle ACE director, VMware vExpert, Oracle Certified DBA, Certified Exadata Specialist, and a Certified RAC Expert. Charles’s books include the following: ·        Oracle Database 11g New Features for DBA and Developers ·         Linux Recipes for Oracle DBAs ·         Oracle Data Guard 11g Handbook ·         Virtualizing Business Critical Oracle Databases: Database as a Service ·         Oracle ASM 12c Pocket Reference Guide ·         Expert Exadata Handbook Charles is the president of the Cloud Computing (and Virtualization) SIG for the Independent Oracle User Group. Charles blogs regularly at the DBAExpert.com/ blog site. His LinkedIn profile is http://www.linkedin.com/in/chkim. His Twitter tag is @racdba   Steven Jones is a 16-year veteran of technical training with experience in UNIX, networking, database technology, virtualization, and big data. Steven works at VMware as a VMware Certified Instructor; VCA; VCP 4, 5, 6; and vExpert 2014, 2015. He is a coauthor of Virtualize Oracle Business Critical Databases: Database Infrastructure as a Service, by Charles Kim, George Trujillo, Steven Jones, and Sudhir Balasubramanian 2014 iBooks. He was a speaker for VMworld 2013 Virtualizing Mission Critical Oracle RAC with vC Ops, San Francisco and Barcelona, and a co-speaker worldwide for VMware Education SDDC Intensive Workshop. Steven seeks to bring innovation, analogy, and narrative to understanding and mastering information technology as a service.   Rommel Garcia is a senior solutions engineer at Hortonworks, a leading open source company driving the adoption of Hadoop. Rommel has spent the past few years focusing on the design, installation, and deployment of large-scale Hadoop ecosystems. He has helped organizations implement security best practices and guidelines for Hadoop platforms. He has performance tuned Hadoop clusters ranging from fast-growing startups to Fortune 100 organizations. Rommel is a nationally recognized speaker at Hadoop and big data conferences. He is also well known for his expertise in performance tuning Java applications and middle-tier platforms. He has a BS in electronics engineering and an MS degree in computer science. Rommel resides in Atlanta with his wife, Elizabeth, and his children, Mila and Braden.   Justin Murray is a senior technical marketing architect at VMware. He holds a BA and a post-graduate diploma in computer science from University College Cork in Ireland. Justin has worked in software engineering, technical training, and consulting in various companies in the UK and the United States. Since 2007, he has been working with VMware’s partner companies to validate and optimize big data and other next-generation application workloads on VMware vSphere.


Best Sellers


Product Details
  • ISBN-13: 9780133811117
  • Publisher: Pearson Education (US)
  • Publisher Imprint: VMWare Press
  • Language: English
  • Series Title: VMware Press Technology
  • Weight: 1 gr
  • ISBN-10: 0133811115
  • Publisher Date: 04 Jul 2015
  • Binding: Digital download
  • No of Pages: 480
  • Sub Title: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture


Similar Products

Add Photo
Add Photo

Customer Reviews

REVIEWS      0     
Click Here To Be The First to Review this Product
Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
Pearson Education (US) -
Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    Fresh on the Shelf


    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!