Home > Computing and Information Technology > Databases > Data mining > Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)

Name: Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
Brand: Pearson Education (US)
SKU: 0133811115
Price: 166 AED
Availability: OutOfStock
ISBN: 9780133811117

(Digital download) | Released: 04 Jul 2015

By: Steve Jones (Author) , Charles Kim (Author) , Justin Murray (Author) , George Trujillo (Author) , Rommel Garcia (Author) | Publisher: Pearson Education (US) | Publisher Imprint: VMWare Press

Write Reviews

AED166

Out of Stock

Notify me when this book is in stock

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
Format: Digital download

About the Book

Plan and Implement Hadoop Virtualization for Maximum Performance, Scalability, and Business Agility Enterprises running Hadoop must absorb rapid changes in big data ecosystems, frameworks, products, and workloads. Virtualized approaches can offer important advantages in speed, flexibility, and elasticity. Now, a world-class team of enterprise virtualization and big data experts guide you through the choices, considerations, and tradeoffs surrounding Hadoop virtualization. The authors help you decide whether to virtualize Hadoop, deploy Hadoop in the cloud, or integrate conventional and virtualized approaches in a blended solution. First, Virtualizing Hadoop reviews big data and Hadoop from the standpoint of the virtualization specialist. The authors demystify MapReduce, YARN, and HDFS and guide you through each stage of Hadoop data management. Next, they turn the tables, introducing big data experts to modern virtualization concepts and best practices. Finally, they bring Hadoop and virtualization together, guiding you through the decisions you’ll face in planning, deploying, provisioning, and managing virtualized Hadoop. From security to multitenancy to day-to-day management, you’ll find reliable answers for choosing your best Hadoop strategy and executing it. Coverage includes the following:           •        Reviewing the frameworks, products, distributions, use cases, and roles associated with Hadoop           •        Understanding YARN resource management, HDFS storage, and I/O           •        Designing data ingestion, movement, and organization for modern enterprise data platforms           •        Defining SQL engine strategies to meet strict SLAs           •        Considering security, data isolation, and scheduling for multitenant environments           •        Deploying Hadoop as a service in the cloud           •        Reviewing the essential concepts, capabilities, and terminology of virtualization           •        Applying current best practices, guidelines, and key metrics for Hadoop virtualization           •        Managing multiple Hadoop frameworks and products as one unified system           •        Virtualizing master and worker nodes to maximize availability and performance           •        Installing and configuring Linux for a Hadoop environment

Table of Contents:
Foreword xix Preface xxi Part I: Introduction to Hadoop Chapter 1 Understanding the Big Data World 1 The Data Revolution 2 Traditional Data Systems 4     Semi-Structured and Unstructured Data 5     Causation and Correlation 7     Data Challenges 8 The Modern Data Architecture 17 Organizational Transformations 20 Industry Transformation 21 Summary 22 Chapter 2 Hadoop Fundamental Concepts 23 Types of Data in Hadoop 23 Use Cases 25 What Is Hadoop? 26 Hadoop Distributions 32 Hadoop Frameworks 32 NoSQL Databases 37     What Is NoSQL? 38 A Hadoop Cluster 42 Hadoop Software Processes 45     Hadoop Hardware Profiles 48 Roles in the Hadoop Environment 56 Summary 59 Chapter 3 YARN and HDFS 61 A Hadoop Cluster Is Distributed 61 Hadoop Directory Layouts 65     Hadoop Operating System Users 67 The Hadoop Distributed File System 67     YARN Logging 70     The NameNode 70     The DataNode 71     Block Placement 75     NameNode Configurations and Managing Metadata 77 Rack Awareness 82     Block Management 83     The Balancer 84     Maintaining Data Integrity in the Cluster 84 Quotas and Trash 92 YARN and the YARN Processing Model 93     Running Applications on YARN 101     Resource Schedulers 107     Benchmarking 112     TeraSort Benchmarking Suite 115 Summary 117 Chapter 4 The Modern Data Platform 119 Designing a Hadoop Cluster 119     Enterprise Data Movement 124 Summary 140 Chapter 5 Data Ingestion 141 Extraction, Loading, and Transformation (ELT) 141     Sqoop: Data Movement with SQL Sources 143     Flume: Streaming Data 148     Oozie: Scheduling and Workfl ow 167     Falcon: Data Lifecycle Management 172     Kafka: Real-time Data Streaming 176 Summary 186 Chapter 6 Hadoop SQL Engines 187 Where SQL Was Born 187 SQL in Hadoop 188 Hadoop SQL Engines 190     Selecting the SQL Tool For Hadoop 190 Now Getting Groovy with Hive and Pig 198     Hive 199     HCatalog 213     Pig 215 Summary 221 Chapter 7 Multitenancy in Hadoop 223 Securing the Access 224     Authentication 225     Auditing 230     Authorization 230     Data Protection 232     Isolating the Data 241     Isolating the Process 251 Summary 255 Part II: Introduction to Virtualization Chapter 8 Virtualization Fundamentals 257 Why Virtualize Hadoop? 258     Introduction to Virtualization 261 Summary 276 References 276 Chapter 9 Best Practices for Virtualizing Hadoop 277 Running Virtualized Hadoop with Purpose and Discipline 277     The Discipline of Purpose Starts with a Clear Target 279     Virtualizing Different Tiers of Hadoop 280     Industry Best Practices 282 Summary 298 Part III: Virtualizing Hadoop Chapter 10 Virtualizing Hadoop 299 How Are Hadoop Ecosystems Going to Be Managed? 300     Building an Enterprise Hadoop Platform That Is Agile and Flexible 301     Clarification of Terms 302     The Journey from Bare-Metal to Virtualization 303 Why Consider Virtualizing Hadoop? 304     Benefits of Virtualizing Hadoop 305     Virtualized Hadoop Can Run as Fast or Faster Than Native 306     Coordination and Cross-Purpose Specialization Is the Future 309     Barriers Can Be Organizational 310     Virtualization Is Not an All or Nothing Option 310     Rapid Provisioning and Improving Quality of Development and Test Environments 311     Improve High Availability with Virtualization 313     Use Virtualization to Leverage Hadoop Workloads 313     Hadoop in the Cloud 314     Big Data Extensions 314     The Path to Virtualization 315     The Software-Defined Data Center 316     Virtualizing the Network 318     vRealize Suite 320 Summary 321 References 322 Chapter 11 Virtualizing Hadoop Master Servers 323 Virtualizing Servers in a Hadoop Cluster 324     Virtualizing the Environment Around Hadoop 325     Virtualizing the Master Hadoop Servers 325     Virtualizing Without the SAN 330 Summary 331 Chapter 12 Virtualizing the Hadoop Worker Nodes 333 A Brief Introduction to the Worker Nodes in Hadoop 333 Deployment Models for Hadoop Clusters 335     The Combined Model 336     The Separated Model 339     Network Effects of the Data-Compute Separation 341     The Shared-Storage Approach to the Data-Compute Separated Model 343     Local Disks for the Application’s Temporary Data 345     The Shared Storage Architecture Model Using Network-Attached Storage (NAS) 345     Deployment Model Summary 348 Best Practices for Virtualizing Hadoop Workers 349     Disk I/O 349 The Hadoop Virtualization Extensions (HVE) 354 Summary 357 References 358 Resources 358 Chapter 13 Deploying Hadoop as a Service in the Private Cloud 361 The Cloud Context 361     Stakeholders for Hadoop 362     Overview of the Solution Architecture 368 Summary 370 References 371 Chapter 14 Understanding the Installation of Hadoop 373 Map the Right Solutions to the Right Use Case 373     Thoughts About Installing Hadoop 374 Configuring Repositories 376     Installing HDP 2.2 378     Environment Preparation 378 Setting Up the Hadoop Configuration 389 Starting HDFS and YARN 393     Start YARN 396     Verifying MapReduce Functionality 398 Installing and Configuring Hive 400 Installing and Configuring MySQL Database 401 Installing and Configuring Hive and HCatalog 401 Summary 404 Chapter 15 Configuring Linux for Hadoop 405 Supported Linux Platforms 406 Different Deployment Models 406 Linux Golden Templates 407     Building a Linux Enterprise Hadoop Platform 408     Selecting the Linux Distribution 411 Optimal Linux Kernel Parameters and System Settings 411     epoll 411     Disable Swap Space 412     Disable Security During Install 412     IO Scheduler Tuning 414     Check Transparent Huge Pages Configuration 414     Limits.conf 414     Partition Alignment for RDMs 415     File System Considerations 416     Lazy Count Parameter for XFS 418     Mount Options 418     I/O Scheduler 419     Disk Read and Write Options 421     Storage Benchmarking 421     Java Version 422     Set Up NTP 423     Enable Jumbo Frames 424     Additional Network Considerations 425 Summary 427 Appendix A Hadoop Cluster Creation: A Prerequisite Checklist 429 Appendix B Big Data/Hadoop on VMware vSphere Reference Materials 433 Deployment Guides 433 Reference Architectures 434 Customer Case Studies 434 Performance 434 vSphere Big Data Extensions (BDE) 435 Other vSphere Features and Big Data 436 9780133811025   TOC   7/7/2015

About the Author :
George J. Trujillo, Jr. is an experienced corporate executive with exceptional communication skills. He is an expert in change management with strong leadership skills, critical thinking, and data-driven decisions. George is an internationally recognized data architect, leader, and speaker in big data and cloud solutions. His background includes Big Data Architecture, Hadoop (Hortonworks, Cloudera), data governance, schema design, metadata management, security, NoSQL, and BI. He has many industry recognitions, including Oracle Recognized Double ACE, Sun Ambassador for Sun Microsystem’s Application Middleware Platform, VMware Recognized vExpert, VMware Certified Instructor, MySQL’s Socrates Award, and MySQL Certified DBA. His leadership in the user community includes Independent Oracle Users Group (IOUG) board of directors, president of IOUG Cloud SIG, chair for RMOUG Big Data SIG, president of RMOUG Cloud SIG, Oracle Fusion Council and Oracle Beta Leadership Council, IOUG’s Elected to “Oracles of Oracle” circle, and master presenter for the IOUG’s Master Series. His many job positions have included vice president of big data architecture in the financial services industry, master principal big data specialist at Hortonworks, tier one data specialist for VMware Center of Excellence, and CEO for professional services and training organization. Charles Kim is the president of Viscosity North America, a niche consulting organization specializing in big data, Oracle Exadata/RAC, and virtualization. Charles is an architect in Hadoop/big data, Linux infrastructure, cloud, virtualization, engineered systems, and Oracle clustering technologies. Charles is an author with Oracle Press, Pearson, and APress in Oracle, Hadoop, and Linux technology stacks. He holds certifications in Oracle, VMware, Red Hat Linux, and Microsoft and has more than 23 years of IT experience on mission- and business-critical systems. Charles presents regularly at VMworld, Oracle OpenWorld, IOUG, and various local/regional user group conferences. He is an Oracle ACE director, VMware vExpert, Oracle Certified DBA, Certified Exadata Specialist, and a Certified RAC Expert. Charles’s books include the following: ·        Oracle Database 11g New Features for DBA and Developers ·         Linux Recipes for Oracle DBAs ·         Oracle Data Guard 11g Handbook ·         Virtualizing Business Critical Oracle Databases: Database as a Service ·         Oracle ASM 12c Pocket Reference Guide ·         Expert Exadata Handbook Charles is the president of the Cloud Computing (and Virtualization) SIG for the Independent Oracle User Group. Charles blogs regularly at the DBAExpert.com/ blog site. His LinkedIn profile is http://www.linkedin.com/in/chkim. His Twitter tag is @racdba Steven Jones is a 16-year veteran of technical training with experience in UNIX, networking, database technology, virtualization, and big data. Steven works at VMware as a VMware Certified Instructor; VCA; VCP 4, 5, 6; and vExpert 2014, 2015. He is a coauthor of Virtualize Oracle Business Critical Databases: Database Infrastructure as a Service, by Charles Kim, George Trujillo, Steven Jones, and Sudhir Balasubramanian 2014 iBooks. He was a speaker for VMworld 2013 Virtualizing Mission Critical Oracle RAC with vC Ops, San Francisco and Barcelona, and a co-speaker worldwide for VMware Education SDDC Intensive Workshop. Steven seeks to bring innovation, analogy, and narrative to understanding and mastering information technology as a service. Rommel Garcia is a senior solutions engineer at Hortonworks, a leading open source company driving the adoption of Hadoop. Rommel has spent the past few years focusing on the design, installation, and deployment of large-scale Hadoop ecosystems. He has helped organizations implement security best practices and guidelines for Hadoop platforms. He has performance tuned Hadoop clusters ranging from fast-growing startups to Fortune 100 organizations. Rommel is a nationally recognized speaker at Hadoop and big data conferences. He is also well known for his expertise in performance tuning Java applications and middle-tier platforms. He has a BS in electronics engineering and an MS degree in computer science. Rommel resides in Atlanta with his wife, Elizabeth, and his children, Mila and Braden. Justin Murray is a senior technical marketing architect at VMware. He holds a BA and a post-graduate diploma in computer science from University College Cork in Ireland. Justin has worked in software engineering, technical training, and consulting in various companies in the UK and the United States. Since 2007, he has been working with VMware’s partner companies to validate and optimize big data and other next-generation application workloads on VMware vSphere.

Best Sellers

See All

Quick View

Too Good To Be True Prajakta Koli

No Review Yet

AED30

Quick View

Thank You for Leaving Rithvik Singh

No Review Yet

AED28

Quick View

Atomic Habits (EXP) James Clear

No Review Yet

AED68

Quick View

My First Library

No Review Yet

AED45

Quick View

Money Myths and Mantras Devina Mehra

No Review Yet

AED32

Quick View

White Nights Ronald Meyer

No Review Yet

AED25

Quick View

Meditations Marcus Aurelius

No Review Yet

AED26

Quick View

Harry Potter Box Set: The Complete Collection (Children’s Paperback) J.K. Rowling

No Review Yet

AED316

Quick View

Atomic Habits James Clear

No Review Yet

AED47

Quick View

Animals Tales From Panchtantra

No Review Yet

AED30

Quick View

Ikigai Francesc Miralles

No Review Yet

AED36

Quick View

Atomic Habits James Clear

No Review Yet

AED84

Product Details

ISBN-13: 9780133811117
Publisher: Pearson Education (US)
Publisher Imprint: VMWare Press
Language: English
Series Title: VMware Press Technology
Weight: 1 gr

ISBN-10: 0133811115
Publisher Date: 04 Jul 2015
Binding: Digital download
No of Pages: 480
Sub Title: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture

Related Categories

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology) How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology) Format: Digital download

Best Sellers

Similar Products

Customer Reviews

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)

Fresh on the Shelf

Inspired by your browsing history

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture(VMware Press Technology)
Format: Digital download