Automated Machine Learning for Data-centric Systems provides a system-oriented and knowledge-driven perspective on automated machine learning in modern data-centric environments. As machine learning models become core components of data management systems, the manual design and optimization of models increasingly limit scalability, reproducibility, and long-term adaptability. This book addresses these challenges by rethinking AutoML not merely as a collection of optimization algorithms, but as a foundational capability embedded within data-centric systems.
The book presents a unified framework that connects core AutoML techniques—such as hyperparameter optimization, combined algorithm selection and configuration, neural architecture search, and model compression—with system-level considerations and diverse data scenarios. It emphasizes how knowledge, experience, and structural properties of data can guide automation, enabling AutoML systems to move beyond blind search toward more efficient, interpretable, and sustainable model design. Through detailed discussions of temporal, sequential, graph, and federated data settings, the book demonstrates how AutoML techniques can be adapted to real-world constraints including data heterogeneity, resource limitations, and deployment complexity.
Designed for researchers, graduate students, and practitioners, this book bridges the gap between algorithm-centric AutoML research and the practical needs of data-centric systems. By integrating theoretical foundations with system-level insights and emerging research directions, Automated Machine Learning for Data-centric Systems serves as both a comprehensive reference and a forward-looking guide for building scalable, intelligent, and automated data-driven systems.
Table of Contents:
"Chapter 1: Introduction to AutoML for Data-Centric Systems".- "Chapter 2: Hyperparameter Optimization (HPO)".- "Chapter 3: Combined Algorithm Selection and Hyperparameter Optimization (CASH)".- "Chapter 4: Neural Architecture Search (NAS)".- "Chapter 5: Model Compression and Efficiency".- "Chapter 6: AutoML for Federated earning".- "Chapter 7: AutoML for Temporal and Sequential Data".- "Chapter 8: AutoML for Graph Learning".- "Chapter 9: Knowledge-Driven AutoML".- "Chapter 10: System Optimization for AutoML".- "Chapter 11: Future Directions and Open Challenges".
About the Author :
Hongzhi Wang is a Tenured Professor, Director of the Department of Computer Science and Engineering, Director of the Massive Data Computing Research Center, Leader of the Data Science and Big Data Technology Program, Director of the Heilongjiang Key Laboratory of Big Data Science and Engineering, and Head of the Youth Scientist Studio at Harbin Institute of Technology (HIT), also serving as Secretary-General of the Alumni Association of the School of Computing. A Distinguished Member of the China Computer Federation (CCF) and IEEE Senior Member, his research focuses on databases, big data management and analysis, and big data governance. He has published over 350 papers, with more than 100 indexed by SCI and over 5,000 citations. He has presided over more than 10 projects including National Natural Science Foundation of China (NSFC) Key Projects and international cooperation programs, and participated in several national, provincial and ministerial key projects as a core member. His honors include Heilongjiang Provincial Teaching Master, Young Longjiang Scholar, Microsoft Scholar, and he is a member of the Heilongjiang "Leading Goose Program" team, having won one Heilongjiang Provincial Natural Science Award and one Ministry of Education University Science and Technology Progress Award.
Chunnan Wang, a PhD graduate from the Massive Data Computing Research Center of HIT, is currently a Senior Researcher at Tencent Singapore. Her doctoral research focused on automated machine learning (AutoML). She has published over 20 papers in top international journals and conferences such as TKDE, TKDD, ICDE, and CVPR, including 15 as the first author. She has proposed effective AutoML solutions for diverse functional requirements, application scenarios, and machine learning tasks, significantly lowering the threshold for machine learning adoption with substantial practical value. During her doctoral studies, she received the National Scholarship, Tencent Scholarship, and was selected as a Tencent Technology Master.
Tianyu Mu holds a Ph.D. in Computer Science and Technology from HIT and is currently a Senior Algorithm Engineer at Huawei Cloud Computing Technology Co., Ltd.. His research focuses on Automated Machine Learning (AutoML), specializing in algorithm selection, hyperparameter optimization, and time series analysis. He aims to build efficient, scalable automated modeling systems using meta-learning and reinforcement learning. As the first author, he has published papers in top-tier venues including VLDB, ICDE (2023/2024/2025), and Information Sciences. His representative contributions include TSC-AutoML (a meta-learning-based framework for time series classification) and ShrinkHPO (an explainable parallel hyperparameter optimization algorithm). At Huawei Cloud, he focuses on cloud-based hardware failure prediction systems and intelligent operation algorithms for large-scale cloud infrastructures.
Yusi Yang is a PhD candidate at HIT’s School of Computing Science and Technology. She has won awards in national and provincial competitions such as the "Datang Cup" National University Student Information and Communication Technology Competition.