About the Book
A practical guide to building a modern, GenAI-powered data platform with a Lakehouse foundation, covering MDM, data mesh, AI enablement, streaming pipelines, observability, and cloud-driven architectures for trusted analytics.
Key Features
Discover characteristics of future-ready platforms - data mesh, automation, & observability
Design trustworthy data products with contracts, federated governance, and decentralized ownership
Understand how GenAI accelerates Lakehouse development and enables self‑service analytics
Book DescriptionDiscover the defining hallmarks of future‑ready data platforms, including data mesh architectures, intelligent automation, and end‑to‑end data observability. Learn how to design and deliver trusted data products through data contracts, federated governance, decentralized domain ownership, and endorsed datasets. The book explores modern Lakehouse patterns with a strong focus on the medallion architecture, explaining how bronze, silver, and gold layers transform raw data into analytics‑ready assets governed through Unity Catalog. You’ll gain practical guidance on MDM linkages, survivorship rules, and entity resolution to ensure consistent master data across domains. It also covers real‑time and streaming pipelines that integrate seamlessly with the Lakehouse. A dedicated focus is placed on self‑service analytics, showing how governed data products empower business users to explore, analyze, and derive insights independently with confidence. Finally, understand how GenAI accelerates platform development through automated code generation using tools like Claude Code and Databricks Genie Code, enabling faster pipeline creation, governance, and analytics delivery.What you will learn
Future‑ready platforms: data mesh, automation, observability
Design trusted data products with contracts and governance
Build Lakehouses with medallion architecture: bronze, silver, gold
Apply Unity Catalog for governance and endorsed datasets
Implement MDM using linkages, survivorship, and entity resolution
Develop real‑time and streaming pipelines at scale
Enable governed self‑service analytics for business users
Use GenAI to generate code with Claude and Databricks Genie
Who this book is forThis book is crafted for aspiring data and AI/ML architects, engineers and analysts starting their data engineering journey and seeking a practical, hands‑on guide to building scalable, cloud‑driven data platforms. It’s ideal for professionals familiar with PySpark who want to design modern Lakehouse architectures using Delta Lake, while learning MDM, data mesh, AI enablement, streaming pipelines, automation, and data observability. A working knowledge of Python, Spark, and SQL is expected.
Table of Contents:
Table of Contents- The Story of Data Engineering and Analytics
- Discovering Storage and Compute in Lakehouses
- Data Engineering on Microsoft Azure
- Designing Future Data Platforms
- Databricks, Medallion Architecture & Delta Lake
- Understanding Modern Data Pipelines
- Data Collection Stage – The Bronze Layer
- Data Curation Stage – The Silver Layer
- Data Aggregation Stage – The Gold Layer
- Next-Gen Data Analytics with Generative AI
- Data Observability
- Data Governance
About the Author :
Manoj Kukreja is a Principal Architect at Northbay Solutions who specializes in creating complex Data Lakes and Data Analytics Pipelines for large-scale organizations such as banks, insurance companies, universities, and US/Canadian government agencies. Previously, he worked for Pythian, a large managed service provider where he was leading the MySQL and MongoDB DBA group and supporting large-scale data infrastructure for enterprises across the globe. With over 25 years of IT experience, he has delivered Data Lake solutions using all major cloud providers including AWS, Azure, GCP, and Alibaba Cloud. On weekends, he trains groups of aspiring Data Engineers and Data Scientists on Hadoop, Spark, Kafka and Data Analytics on AWS and Azure Cloud.