As robots increasingly enter homes, hospitals, factories, classrooms, and public environments, natural and trustworthy communication between humans and autonomous agents becomes essential. Single-modality interfaces are no longer sufficient. Effective human–robot interaction requires the seamless integration of speech, vision, gesture, motion, probabilistic reasoning, and ethical decision-making.
Multimodal Communication in Human–Robot Interaction presents a comprehensive and systems-level framework for designing interactive robots capable of interpreting and generating coordinated multimodal behavior. Rather than treating speech processing, computer vision, motion planning, and artificial intelligence as isolated components, this textbook proposes a path forward on how they operate together within a perception–action cycle that enables robots to understand human intent, act safely in shared environments, and communicate their reasoning transparently.
Foundational and advanced topics are integrated, including spoken dialogue systems, wake-word detection, speaker verification, object recognition, gesture interpretation, audiovisual synchronization, probabilistic state estimation, cooperative control, socially aware navigation, and explainable artificial intelligence. A contribution of the book is its emphasis on multimodal robots and generative AI that not only interpret human signals but also produce aligned speech, gesture, motion, and explanation in context-sensitive and safety-aware ways.
Ethical considerations are treated as essential for the robotic design and interaction process. As robotic systems gain autonomy through deep learning and large generative models, the ethical considerations become even more relevant. This textbook examines the topics of training bias, explainability, accountability, transparency, and the carbon footprint created by AI systems. In addition, questions regarding consciousness, sapience, and superintelligence are also discussed.
Table of Contents:
"Chapter 1-Introduction".- "Chapter 2-Verbal and non-verbal communication using Speech and Written Text".- "Chapter 3-Gestures and image for communication".- "Chapter 4-Image, graphics and visualization based communication".- "Chapter 5-Multimodal Sensor Fusion".- "Chapter 6-Semantic Web and Ontology in Multimodal Interactive Human Machine Communication".- "Chapter 7-Co-operative Robot and Control in Human Robot Interaction".- "Chapter 8-Conclusion and future perspective".
About the Author :
Veton Këpuska is a Computer Engineer with a PhD from Clemson University, 1990. Dr. Këpuska is the inventor of the Wake-Up-Word Speech Recognition method, a communication approach that enables machines to recognize activation commands. In 2003, he joined the Florida Institute of Technology, where he taught graduate-level courses that he developed in Speech Processing, Speech Recognition, and Natural Language Understanding. He has also taught numerous undergraduate courses, including programming and microcontroller systems. He employs a laboratory environment he developed, known as SASE_LAB, a MATLAB tool to demonstrate and explain key concepts, among other techniques.
Steven Liu studied electrical engineering and received his Dipl.-Ing. and Dr.-Ing. degrees in 1986 and 1992 from the Technische Universität Berlin, respectively. After various professional positions in industry and academic institutions he joined the Technische Universität Kaiserslautern (now renamed to RPTU University of Kaiserslautern-Landau) as a full professor in 2004. His current research interests include control of mechatronic and power systems, cooperative robotics, networked control etc.
Rosina Weber is a Professor of Information Science and Computer Science at Drexel University. Professor Weber is a leader in explainable artificial intelligence (XAI) and case-based reasoning. Her research has been funded by the NIH, DARPA, DHS, and international agencies. She has co-chaired multiple XAI workshops and her scholarship appears in venues such as AI Magazine, Applied AI Letters, Expert Systems with Applications, Knowledge-Based Systems, AAAI, and ICCBR. Weber has co-authored the Springer’s “Case-Based Reasoning: a Textbook” with Prof. Michael M. Richter.
Marius Silaghi is a Professor of Computer Science at Florida Tech leading research in distributed AI, speech and language engineering, and AI agents (educated at EPFL, ETHZ, UTCN). He had research visits at TU-Braunschweig, TU-Klagenfurth, University of Valenciennes, Aarhus University, and IDIAP Martigny, and earned best paper awards or nominations at CP, IAT, CIA, FLAIRS. He delivered tutorials at IJCAI and AAMAS.
Edwin Nowicki is a professor emeritus of Electrical and Software Engineering, University of Calgary, and studied engineering at the University of Toronto. His research is in small renewable energy systems, and once worked with a wonderful small group of researchers on a microhydro project to assist the village of Ghodasin, Jumla, Nepal. On two occasions he taught power electronics for renewable energy systems, in Addis Ababa, Ethiopia.
Edward Kim received his Ph.D. in Computer Science from Lehigh University and an MSE and BSE from the University of Pennsylvania. He is currently an Associate Professor in the Department of Computer Science at Drexel University. Dr. Kim is an expert in computer vision, deep learning, and knowledge representation. Dr. Kim was the recipient of a DOE visiting faculty award in 2017 and DoD visiting scholar award in 2018. Results of his work were the basis for the Dr. Kim’s 2019 NSF CAREER award in Robust Intelligence and 2021 POCUS AI DARPA project.
Michael Sintek is a senior consultant, research scientist and innovator for Semantic Web technology (esp. rule-based systems and ontologies) at German Research Center for Artificial Intelligence (DFKI), Berlin, Germany. As a visiting researcher at Stanford University, he developed several well-known extensions of the ontology editor Protégé, and the rule-based Semantic Web language TRIPLE.
Sheuli Paul is a scientist at Defence Research and Development Canada. Her research spans interdisciplinary areas including signal processing, machine learning, artificial intelligence, and human–robot interaction. She earned her Dr.-Ing. degree in Electrical and Computer Engineering from the University of Kaiserslautern.