Home > Computing and Information Technology Books > Databases > Data mining > Discovering Knowledge in Data: An Introduction to Data Mining

Discovering Knowledge in Data: An Introduction to Data Mining

Name: Discovering Knowledge in Data: An Introduction to Data Mining
Brand: John Wiley & Sons Inc
SKU: 0470361352
Price: 84 AED
Availability: OutOfStock
ISBN: 9780470361351

(Digital (delivered electronically)) | Released: 07 Feb 2008

By: Daniel T. Larose (Author) | Publisher: John Wiley & Sons Inc | Publisher Imprint: Wiley-Interscience

Write Reviews

AED84

Out of Stock

Notify me when this book is in stock

Discovering Knowledge in Data: An Introduction to Data Mining
An Introduction to Data Mining
Format: Digital (delivered electronically)

About the Book

Learn Data Mining by doing data mining Data mining can be revolutionary-but only when it's done right. The powerful black box data mining software now available can produce disastrously misleading results unless applied by a skilled and knowledgeable analyst. Discovering Knowledge in Data: An Introduction to Data Mining provides both the practical experience and the theoretical insight needed to reveal valuable information hidden in large data sets. Employing a "white box" methodology and with real-world case studies, this step-by-step guide walks readers through the various algorithms and statistical structures that underlie the software and presents examples of their operation on actual large data sets. Principal topics include: * Data preprocessing and classification * Exploratory analysis * Decision trees * Neural and Kohonen networks * Hierarchical and k-means clustering * Association rules * Model evaluation techniques Complete with scores of screenshots and diagrams to encourage graphical learning, Discovering Knowledge in Data: An Introduction to Data Mining gives students in Business, Computer Science, and Statistics as well as professionals in the field the power to turn any data warehouse into actionable knowledge. An Instructor's Manual presenting detailed solutions to all the problems in the book is available online.

Table of Contents:
PREFACE xi 1 INTRODUCTION TO DATA MINING 1 What Is Data Mining? 2 Why Data Mining? 4 Need for Human Direction of Data Mining 4 Cross-Industry Standard Process: CRISP–DM 5 Case Study 1: Analyzing Automobile Warranty Claims: Example of the CRISP–DM Industry Standard Process in Action 8 Fallacies of Data Mining 10 What Tasks Can Data Mining Accomplish? 11 Description 11 Estimation 12 Prediction 13 Classification 14 Clustering 16 Association 17 Case Study 2: Predicting Abnormal Stock Market Returns Using Neural Networks 18 Case Study 3: Mining Association Rules from Legal Databases 19 Case Study 4: Predicting Corporate Bankruptcies Using Decision Trees 21 Case Study 5: Profiling the Tourism Market Using k-Means Clustering Analysis 23 References 24 Exercises 25 2 DATA PREPROCESSING 27 Why Do We Need to Preprocess the Data? 27 Data Cleaning 28 Handling Missing Data 30 Identifying Misclassifications 33 Graphical Methods for Identifying Outliers 34 Data Transformation 35 Min–Max Normalization 36 Z-Score Standardization 37 Numerical Methods for Identifying Outliers 38 References 39 Exercises 39 3 EXPLORATORY DATA ANALYSIS 41 Hypothesis Testing versus Exploratory Data Analysis 41 Getting to Know the Data Set 42 Dealing with Correlated Variables 44 Exploring Categorical Variables 45 Using EDA to Uncover Anomalous Fields 50 Exploring Numerical Variables 52 Exploring Multivariate Relationships 59 Selecting Interesting Subsets of the Data for Further Investigation 61 Binning 62 Summary 63 References 64 Exercises 64 4 STATISTICAL APPROACHES TO ESTIMATION AND PREDICTION 67 Data Mining Tasks in Discovering Knowledge in Data 67 Statistical Approaches to Estimation and Prediction 68 Univariate Methods: Measures of Center and Spread 69 Statistical Inference 71 How Confident Are We in Our Estimates? 73 Confidence Interval Estimation 73 Bivariate Methods: Simple Linear Regression 75 Dangers of Extrapolation 79 Confidence Intervals for the Mean Value of y Given x 80 Prediction Intervals for a Randomly Chosen Value of y Given x 80 Multiple Regression 83 Verifying Model Assumptions 85 References 88 Exercises 88 5 k-NEAREST NEIGHBOR ALGORITHM 90 Supervised versus Unsupervised Methods 90 Methodology for Supervised Modeling 91 Bias–Variance Trade-Off 93 Classification Task 95 k-Nearest Neighbor Algorithm 96 Distance Function 99 Combination Function 101 Simple Unweighted Voting 101 Weighted Voting 102 Quantifying Attribute Relevance: Stretching the Axes 103 Database Considerations 104 k-Nearest Neighbor Algorithm for Estimation and Prediction 104 Choosing k 105 Reference 106 Exercises 106 6 DECISION TREES 107 Classification and Regression Trees 109 C4.5 Algorithm 116 Decision Rules 121 Comparison of the C5.0 and CART Algorithms Applied to Real Data 122 References 126 Exercises 126 7 NEURAL NETWORKS 128 Input and Output Encoding 129 Neural Networks for Estimation and Prediction 131 Simple Example of a Neural Network 131 Sigmoid Activation Function 134 Back-Propagation 135 Gradient Descent Method 135 Back-Propagation Rules 136 Example of Back-Propagation 137 Termination Criteria 139 Learning Rate 139 Momentum Term 140 Sensitivity Analysis 142 Application of Neural Network Modeling 143 References 145 Exercises 145 8 HIERARCHICAL AND k-MEANS CLUSTERING 147 Clustering Task 147 Hierarchical Clustering Methods 149 Single-Linkage Clustering 150 Complete-Linkage Clustering 151 k-Means Clustering 153 Example of k-Means Clustering at Work 153 Application of k-Means Clustering Using SAS Enterprise Miner 158 Using Cluster Membership to Predict Churn 161 References 161 Exercises 162 9 KOHONEN NETWORKS 163 Self-Organizing Maps 163 Kohonen Networks 165 Example of a Kohonen Network Study 166 Cluster Validity 170 Application of Clustering Using Kohonen Networks 170 Interpreting the Clusters 171 Cluster Profiles 175 Using Cluster Membership as Input to Downstream Data Mining Models 177 References 178 Exercises 178 10 ASSOCIATION RULES 180 Affinity Analysis and Market Basket Analysis 180 Data Representation for Market Basket Analysis 182 Support, Confidence, Frequent Itemsets, and the A Priori Property 183 How Does the A Priori AlgorithmWork (Part 1)? Generating Frequent Itemsets 185 How Does the A Priori AlgorithmWork (Part 2)? Generating Association Rules 186 Extension from Flag Data to General Categorical Data 189 Information-Theoretic Approach: Generalized Rule Induction Method 190 J-Measure 190 Application of Generalized Rule Induction 191 When Not to Use Association Rules 193 Do Association Rules Represent Supervised or Unsupervised Learning? 196 Local Patterns versus Global Models 197 References 198 Exercises 198 11 MODEL EVALUATION TECHNIQUES 200 Model Evaluation Techniques for the Description Task 201 Model Evaluation Techniques for the Estimation and Prediction Tasks 201 Model Evaluation Techniques for the Classification Task 203 Error Rate, False Positives, and False Negatives 203 Misclassification Cost Adjustment to Reflect Real-World Concerns 205 Decision Cost/Benefit Analysis 207 Lift Charts and Gains Charts 208 Interweaving Model Evaluation with Model Building 211 Confluence of Results: Applying a Suite of Models 212 Reference 213 Exercises 213 EPILOGUE: "WE'VE ONLY JUST BEGUN" 215 INDEX 217

About the Author :
DANIEL T. LAROSE received his PhD in statistics from the University of Connecticut. An associate professor of statistics at Central Connecticut State University, he developed and directs Data Mining@CCSU, the world's first online master of science program in data mining. He has also worked as a data mining consultant for Connecticut-area companies. He is currently working on the next two books of his three-volume series on Data Mining: Data Mining Methods and Models and Data Mining the Web: Uncovering Patterns in Web Content, scheduled to publish respectively in 2005 and 2006. DANIEL T. LAROSE received his PhD in statistics from the University of Connecticut. An associate professor of statistics at Central Connecticut State University, he developed and directs Data Mining@CCSU, the world's first online master of science program in data mining. He has also worked as a data mining consultant for Connecticut-area companies. He is currently working on the next two books of his three-volume series on Data Mining: Data Mining Methods and Models and Data Mining the Web: Uncovering Patterns in Web Content, scheduled to publish respectively in 2005 and 2006.

Review :
"...an excellent introductory book of data mining. I recommend it for every one who wants to learn data mining." (Journal of Statistical Software, May 2006) "...selected material is described in a simple, clear, and…precise way...case studies…examples, and screen shots has definitely added to the learning value of the book." (Journal of Biopharmaceutical Statistics, January/February 2006) "...does a good job introducing data mining to novices...it skillfully previews some of the basic statistical issues needed to understand data mining techniques." (Journal of the American Statistical Association, December 2005) "If you need a book to help colleagues understand your data mining procedures and results, this is the one you want to give them." (Technometrics, November 2005) "…an excellent book…it should be useful for anyone interested in analysing epidemiological data." (Statistics in Medical Research, October 2005) "...an excellent 'white-box' overview of established approaches for data analysis, in which readers are shown how, why, and when the methods work." (CHOICE, April 2005) "Larose has the making of a good series of books on data mining…I, for one, look forward to the next two books in the series." (Computing Reviews.com, February 15, 2005) "...an excellent introductory book of data mining. I recommend it for every one who wants to learn data mining." (Journal of Statistical Software, May 2006) "...selected material is described in a simple, clear, and…precise way...case studies…examples, and screen shots has definitely added to the learning value of the book." (Journal of Biopharmaceutical Statistics, January/February 2006) "...does a good job introducing data mining to novices...it skillfully previews some of the basic statistical issues needed to understand data mining techniques." (Journal of the American Statistical Association, December 2005) "If you need a book to help colleagues understand your data mining procedures and results, this is the one you want to give them." (Technometrics, November 2005) "…an excellent book…it should be useful for anyone interested in analysing epidemiological data." (Statistics in Medical Research, October 2005) "...an excellent 'white-box' overview of established approaches for data analysis, in which readers are shown how, why, and when the methods work." (CHOICE, April 2005) "Larose has the making of a good series of books on data mining…I, for one, look forward to the next two books in the series." (Computing Reviews.com, February 15, 2005) "...an excellent introductory book of data mining. I recommend it for every one who wants to learn data mining." (Journal of Statistical Software, May 2006) "...selected material is described in a simple, clear, and…precise way...case studies…examples, and screen shots has definitely added to the learning value of the book." (Journal of Biopharmaceutical Statistics, January/February 2006) "...does a good job introducing data mining to novices...it skillfully previews some of the basic statistical issues needed to understand data mining techniques." (Journal of the American Statistical Association, December 2005) "If you need a book to help colleagues understand your data mining procedures and results, this is the one you want to give them." (Technometrics, November 2005) "…an excellent book…it should be useful for anyone interested in analysing epidemiological data." (Statistics in Medical Research, October 2005) "...an excellent 'white-box' overview of established approaches for data analysis, in which readers are shown how, why, and when the methods work." (CHOICE, April 2005) "Larose has the making of a good series of books on data mining…I, for one, look forward to the next two books in the series." (Computing Reviews.com, February 15, 2005) "...an excellent introductory book of data mining. I recommend it for every one who wants to learn data mining." (Journal of Statistical Software, May 2006) "...selected material is described in a simple, clear, and…precise way...case studies…examples, and screen shots has definitely added to the learning value of the book." (Journal of Biopharmaceutical Statistics, January/February 2006) "...does a good job introducing data mining to novices...it skillfully previews some of the basic statistical issues needed to understand data mining techniques." (Journal of the American Statistical Association, December 2005) "If you need a book to help colleagues understand your data mining procedures and results, this is the one you want to give them." (Technometrics, November 2005) "…an excellent book…it should be useful for anyone interested in analysing epidemiological data." (Statistics in Medical Research, October 2005) "...an excellent 'white-box' overview of established approaches for data analysis, in which readers are shown how, why, and when the methods work." (CHOICE, April 2005) "Larose has the making of a good series of books on data mining…I, for one, look forward to the next two books in the series." (Computing Reviews.com, February 15, 2005)

Best Sellers

See All

Quick View

Too Good To Be True Prajakta Koli

4.3

(6)

AED45

Quick View

Thank You for Leaving Rithvik Singh

(5)

AED42

Quick View

Atomic Habits (EXP) James Clear

No Review Yet

AED94

Quick View

My First Library

4.1

(8)

AED61

Quick View

Dopamine Detox Thibaut Meurisse

No Review Yet

AED41

Quick View

Money Myths and Mantras Devina Mehra

4.7

(3)

AED45

Quick View

Meditations Marcus Aurelius

4.3

(6)

AED40

Quick View

Harry Potter Box Set: The Complete Collection (Children’s Paperback) J.K. Rowling

4.3

(8)

AED226

Quick View

Atomic Habits James Clear

4.6

(5)

AED65

Quick View

The Art of Being Alone Renuka Gavrani

(6)

AED43

Quick View

Animals Tales From Panchtantra

4.5

(10)

AED43

Quick View

My First Book of Patterns Pencil Control

4.6

(8)

AED20

Product Details

ISBN-13: 9780470361351
Publisher: John Wiley & Sons Inc
Publisher Imprint: Wiley-Interscience
Language: English
Sub Title: An Introduction to Data Mining

ISBN-10: 0470361352
Publisher Date: 07 Feb 2008
Binding: Digital (delivered electronically)
No of Pages: 222

Related Categories

Computing and Information Technology > Databases > Data mining

Discovering Knowledge in Data: An Introduction to Data Mining

Discovering Knowledge in Data: An Introduction to Data Mining An Introduction to Data Mining Format: Digital (delivered electronically)

Best Sellers

Similar Products

Customer Reviews

Discovering Knowledge in Data: An Introduction to Data Mining

Inspired by your browsing history

Discovering Knowledge in Data: An Introduction to Data Mining
An Introduction to Data Mining
Format: Digital (delivered electronically)