About the Book
This book provides a systematic overview of the mainstream technical approaches in embodied intelligence, constructing a comprehensive knowledge framework. It offers in-depth explanations of key embodied AI technologies, including perception, navigation, manipulation, and planning, as well as multi-agent interaction, communication, and cooperation. By doing so, it establishes a structured learning architecture for researchers, engineers, and students in the fields of robotics and embodied AI.
This book is divided into eight chapters, with the content organized in a progressive manner from basic to advanced concepts, aiming to build a comprehensive cognitive framework for readers in the field of embodied intelligence, covering both theoretical systems and practical methods. Chapter 2 focuses on the foundational technical system of embodied intelligence, highlighting core content such as 3D spatial representation, reinforcement learning methods, and large model technologies. These foundational technologies form the theoretical backbone of current embodied intelligence algorithms, and a deep understanding of them is a crucial prerequisite for grasping the subsequent content. Chapters 3 to 7 systematically introduce the key technical systems of embodied intelligence from five dimensions: perception, navigation, manipulation, planning, and collaboration. Chapter 8, oriented towards practice, focuses on the implementation paths of embodied intelligence in simulation platforms.
The translation was done with the help of artificial intelligence. A subsequent human revision was done primarily in terms of content.
Table of Contents:
Chapter 1 Overview of Embodied Intelligence.- Chapter 2 Embodied Intelligence Fundamental Technologies.- Chapter 3 Perception and Environmental Understanding.- Chapter 4 Visually Augmented Navigation.- Chapter 5 Vision-Assisted Manipulation Technology.- Chapter 6 Vision-Driven Task Planning.- Chapter 7 Multi-Agent Interaction.- Chapter 8 Introduction to Simulation Platforms.
About the Author :
Dr. Ruimao Zhang is an Associate Professor at the School of Electronics and Communication Engineering, Sun Yat-sen University, as well as a high-level overseas talent in Shenzhen. His research focuses on computer vision, robotic vision, and multimodal large models. In recent years, the core objective of his research group has been to develop "embodied intelligent agents capable of effective human interaction in dynamic environments." As of April 2026, he has published over 90 papers in journals and conferences in the field of artificial intelligence, with his work cited more than 10,000 times and holds over 10 authorized patents. He was rated as an Outstanding Reviewer of NeurIPS, in 2021. He has served as an Area Chair for ICLR 2026 and NeurIPS 2026. He is an editorial board member for ACM ToMM. As a key member, he participated in the 8M Youtube Video Analysis Challenge and won a gold medal, as well as the AIM Learnable Image Processing Challenge, where he secured the championship. He has led and participated in projects funded by the National Natural Science Foundation of China, the Ministry of Science and Technology's Key R&D Program, and the Guangdong Natural Science Foundation, among others. Prof. Liang Lin is an internationally renowned scholar in the field of artificial intelligence (IEEE Fellow, IAPR Fellow, IET Fellow). He is the director of the Multi-Agent and Embodied Intelligence Institute at Pengcheng National Laboratory, and also the full professor at Sun Yat-sen University. He is recipient of the National Science Fund for Distinguished Young Scholars, and Chief Scientist of the National Major Project on Artificial Intelligence. He has achieved a series of breakthroughs and innovative results in the fields of multimodal representation learning, causal inference, and embodied intelligence. As of Oct. 2024, he has published over 400 papers, which have been cited more than 40,000 times, and has received five Best Paper Awards. He has been awarded the First Prize of Natural Science in Heilongjiang Province, the Wu Wenjun Award for Artificial Intelligence (Natural Science Category), and the First Prize of Science and Technology Award from the Chinese Society of Image and Graphics. He has also guided students to win the CCF Outstanding Doctoral Thesis Award, the ACM China Outstanding Doctoral Thesis Award, and the CAAI Outstanding Doctoral Thesis Award. He previously served as the Executive Dean of the SenseTime Research Institute and has nurtured the leading next-generation artificial intelligence company, Tuoyuan Intelligence. He leads his team to adhere to the innovative concept of integrating industry, academia, and research, focusing on developing an embodied general large model that integrates perception, planning, and execution.