Statistics and Data Foundations for AI is an interdisciplinary approach to statistical concepts and data foundations of AI with real-world illustrative examples from authoritative sources such as NASA, NOAA and the US Census Bureau. Co-authored by a data science research expert and an experienced educator, the book serves as a prequel to an AI and machine learning course.
Given the interdependence of data and AI, understanding data and using it responsibly to create and interact with AI tools requires a high level of statistical skill and data intuition. The book includes topics such as data management, exploratory data analysis, sampling, probability theory, hypothesis testing, multivariate analysis, data quality, ethics, data privacy, and responsible use of AI. Every key statistical concept is presented in the context of how it is used by AI applications in areas such as sports, fashion, climate science, environmental science, health, medicine and space exploration. The book makes AI relatable to everyday life so that it is no longer an abstraction. Instructor resources, supplementary materials, further reading and debate topics enable advanced study and deeper thinking.
Statistics and Data Foundations for AI is intended for undergraduate and graduate students, and practitioners interested in learning statistical foundations in relation to data and AI with application to real world problems. The content is accessible to learners from a wide variety of backgrounds (STEM and non-STEM) without sacrificing rigor.
Table of Contents:
1. Statistics, Data and AI 2. About Data 3. Exploratory Data Analysis 4. Sampling: Less is More 5. Probability in the Age of AI 6. Data-driven Hypotheses Testing 7. Variable Relationships: The Full
Picture 8. Data Quality and AI 9. What’s that AI 10. Responsible Use of Data and AI
About the Author :
Tamraparni Dasu (Ph.D. Mathematical Statistics, University of Rochester, 1991) is a research scientist and Data Science expert specializing in computational statistics, machine learning and data quality. She retired in 2021 as Lead Inventive Scientist after 31 years at AT&T Bell Laboratories and now teaches Data Mining, Machine Learning and AI as an adjunct professor at Fairleigh Dickinson University, New Jersey. Dr. Dasu has published extensively in top tier journals and research conferences such as SIGMOD, KDD and VLDB. As an educator, Dr. Dasu is committed to mentoring the next generation of quantitative thinkers, computer scientists and data scientists.
Geetha Murthy (Ed.D. Instructional Leadership, St. John’s University, Queens, NY 2015) is a highly experienced school administrator and math teacher whose research focused on describing and dismantling self-limiting beliefs in students that inhibit students from pursuing STEM education/careers. Most recently, Dr. Murthy served as the K-12 director of mathematics for Herricks School District in Long Island, NY, a high performing public school district which made significant progress under her leadership in increasing student achievement through initiatives that focused on systemic changes to create and sustain greater equity, access and success for all students. Dr. Murthy is passionate about promoting 21st Century Skills in teaching and learning.
Review :
“Statistics and Data Foundations for AI provides a rigorous yet accessible introduction to the statistical theory and data analysis essential for understanding and using modern AI. The book is grounded in real-world examples from authoritative sources such as NASA and NOAA, and thereby connects core statistical concepts directly to contemporary AI applications. Its distinctive emphasis on data quality, ethics, bias, and the limitation of AI equips readers to engage critically and responsibly with AI systems. Suitable for both STEM and non-STEM audiences, the text supports hands-on learning through tools such as R, Excel, and Google Sheets, without assuming prior programming experience.”
Sivaramakrishnan “Bala” Balachandar, Chairman and William F. Powers Professor of Mechanical and Aerospace Engineering at the University of Florida
“This slender volume creatively fills the gap between Data Management, Statistics, and Artificial Intelligence, proving once again that less can be more. It embraces what many educators avoid: The use of Large Language Models (here: Google Gemini) as partners in education. Demonstrations and exercises are built on real, large datasets, not on the toy datasets from yesteryear. While being scientifically rigorous, it builds on the reader’s intuition, to the point of making “Data Intuition” a primary topic of discussion.”
James Geller, Professor and Chair, Department of Data Science, Ying Wu College of Computing, New Jersey Institute of Technology (NJIT)
“This engaging textbook, richly laden with real world examples and case studies, gives an intuitive yet thorough view of probability and statistics, and of how we can use them to understand the stochasticity in our world, the uncertainty in our knowledge, and the workings of artificial intelligence. This is a timely book that deserves a very wide audience.”
Sanjoy Dasgupta, Professor, Jacobs School of Engineering, University of California San Diego
“This amazing book is written both as a textbook for data science and AI learners and educators, as well as a reference for more experienced users. Using a pedagogical approach targeting adult learners, the authors have explained the various aspects of AI, statistical foundations of AI and LLMs, and provided examples, practice exercises, and links to additional learning from current sources. The book provides thoughtful questions for discussion, illustrates the strengths and weaknesses of AI, and engages the reader with descriptions and links to applications from a wide variety of fields. It is well-researched and written in accessible yet scholarly language. As a long-time teacher of statistics and now engaged in teaching with AI, I found the book extremely valuable. I would highly recommend this book for college instructors, individuals who are self-teaching AI use, and researchers seeking to extend their knowledge of AI applications in their work. Congratulations to Dasu and Murthy on this excellent book!”
Rene Parmar, Dean, School of Education, SUNY New Paltz
“At a time when artificial intelligence is often perceived as a “black box,” Statistics and Data Foundations for AI authored by Tamraparni Dasu and Geetha Murthy offers the essential map and compass needed to explore its inner mechanics. By positioning data as the fuel that drives every AI model—from training and learning to inference—the authors establish a robust foundation in data collection, storage, and management, addressing core concepts that are frequently underrepresented in traditional curricula.
What distinguishes this text is its forward-thinking pedagogy. The Data Debate features encourage students to engage deeply and thoughtfully with complex statistical concepts through structured, nonjudgmental discourse. Complementing these are the GeminiBox sections, which exemplify active learning in action—guiding students to craft precise AI prompts and, importantly, to evaluate model outputs critically for accuracy and potential “hallucinations.” Together, these elements cultivate both technical proficiency and data literacy essential for working with modern AI tools.
The book succeeds in bridging theory with real-world relevance. From analyzing multivariate factors influencing hurricane trajectories to addressing ethical considerations and sampling bias, the material remains grounded in authentic data and contemporary challenges. The integration of rich visuals, real datasets, and well-designed summative projects ensures that students not only understand core principles but can apply them to meaningful, data-driven inquiries.
Finally, the inclusion of a 125-year historical perspective on the co-evolution of AI and hardware is a masterstroke. It provides invaluable context, reminding readers that today’s AI revolution is the product of a century of innovation in computation and data science. Statistics and Data Foundations for AI is an indispensable resource for educators, students, and practitioners seeking to understand not just where AI stands today, but the data and statistical foundations that will shape its future.”
Kiron Sharma, Professor of Computer Science, Interim Chair, Dept. Computer Science & Mathematics, Olsen College of Engineering and Science, Farleigh Dickinson University