Data mining can be defined as the process of selection, exploration and modelling of large databases, in order to discover models and patterns. The increasing availability of data in the current information society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract such knowledge from data. Applications occur in many different fields, including statistics, computer science, machine learning, economics, marketing and finance. This book is the first to describe applied data mining methods in a consistent statistical framework, and then show how they can be applied in practice. All the methods described are either computational, or of a statistical modelling nature. Complex probabilistic models and mathematical tools are not used, so the book is accessible to a wide audience of students and industry professionals. The second half of the book consists of nine case studies, taken from the author's own work in industry, that demonstrate how the methods described can be applied to real problems.
- Provides a solid introduction to applied data mining methods in a consistent statistical framework
- Includes coverage of classical, multivariate and Bayesian statistical methodology
- Includes many recent developments such as web mining, sequential Bayesian analysis and memory based reasoning
- Each statistical method described is illustrated with real life applications
- Features a number of detailed case studies based on applied projects within industry
- Incorporates discussion on software used in data mining, with particular emphasis on SAS
- Supported by a website featuring data sets, software and additional material
- Includes an extensive bibliography and pointers to further reading within the text
- Author has many years experience teaching introductory and multivariate statistics and data mining, and working on applied projects within industry
A valuable resource for advanced undergraduate and graduate students of applied statistics, data mining, computer science and economics, as well as for professionals working in industry on projects involving large volumes of data - such as in marketing or financial risk management.
Table of Contents:
Preface. 1. Introduction.
PART I: METHODOLOGY.
2. Organisation of the data.
3. Exploratory data analysis.
4. Computational data mining.
5. Statistical data mining.
6. Evaluation of data mining methods.
PART II: BUSINESS CASES.
7. Market basket analysis.
8. Web clickstream analysis.
9. Profiling website visitors.
10. Customer relationship management.
11. Credit scoring.
12. Forecasting television audience.
Bibliography.
Index.
About the Author :
Paolo Giudici Department of Economics and Quantitative Methods, University of Pavia, A lecturer in data mining, business statistics, data analysis and risk management, Professor Giudici is also the director of the data mining laboratory. He is the author of around 80 publications, and the coordinator of two national research grants on data mining, and local coordinator of a European integrated project on the topic. He was the sole author of the first edition of this book, which has been translated into both Italian and Chinese. He is also one of the Editors of Wiley's Series in Computational Statistics.
Review :
"…a book with many nice features that has elements of interest for …every subset of the intended audience…" (Journal of the American Statistical Association, September 2006) "The author's style is consistently readable. Stripping out all but the barest essential mathematics makes the remaining material very approachable to a model-centric audience." (Technometrics, February 2005)