Statistical Data Analytics
Home > Mathematics and Science Textbooks > Mathematics > Probability and statistics > Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery
Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery

Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery

|
     0     
5
4
3
2
1




Out of Stock


Notify me when this book is in stock
About the Book

Statistical Data Analytics Statistical Data Analytics Foundations for Data Mining, Informatics, and Knowledge Discovery A comprehensive introduction to statistical methods for data mining and knowledge discovery Applications of data mining and ‘big data’ increasingly take center stage in our modern, knowledge-driven society, supported by advances in computing power, automated data acquisition, social media development and interactive, linkable internet software. This book presents a coherent, technical introduction to modern statistical learning and analytics, starting from the core foundations of statistics and probability. It includes an overview of probability and statistical distributions, basics of data manipulation and visualization, and the central components of standard statistical inferences. The majority of the text extends beyond these introductory topics, however, to supervised learning in linear regression, generalized linear models, and classification analytics. Finally, unsupervised learning via dimension reduction, cluster analysis, and market basket analysis are introduced. Extensive examples using actual data (with sample R programming code) are provided, illustrating diverse informatic sources in genomics, biomedicine, ecological remote sensing, astronomy, socioeconomics, marketing, advertising and finance, among many others. Statistical Data Analytics: Focuses on methods critically used in data mining and statistical informatics. Coherently describes the methods at an introductory level, with extensions to selected intermediate and advanced techniques. Provides informative, technical details for the highlighted methods. Employs the open-source R language as the computational vehicle – along with its burgeoning collection of online packages – to illustrate many of the analyses contained in the book. Concludes each chapter with a range of interesting and challenging homework exercises using actual data from a variety of informatic application areas. This book will appeal as a classroom or training text to intermediate and advanced undergraduates, and to beginning graduate students, with sufficient background in calculus and matrix algebra. It will also serve as a source-book on the foundations of statistical informatics and data analytics to practitioners who regularly apply statistical learning to their modern data.

Table of Contents:
Preface xiii Part I Background: Introductory Statistical Analytics 1 1 Data analytics and data mining 3 1.1 Knowledge discovery: finding structure in data 3 1.2 Data quality versus data quantity 5 1.3 Statistical modeling versus statistical description 7 2 Basic probability and statistical distributions 10 2.1 Concepts in probability 10 2.1.1 Probability rules 11 2.1.2 Random variables and probability functions 12 2.1.3 Means, variances, and expected values 17 2.1.4 Median, quartiles, and quantiles 18 2.1.5 Bivariate expected values, covariance, and correlation 20 2.2 Multiple random variables∗ 21 2.3 Univariate families of distributions 23 2.3.1 Binomial distribution 23 2.3.2 Poisson distribution 26 2.3.3 Geometric distribution 27 2.3.4 Negative binomial distribution 27 2.3.5 Discrete uniform distribution 28 2.3.6 Continuous uniform distribution 29 2.3.7 Exponential distribution 29 2.3.8 Gamma and chi-square distributions 30 2.3.9 Normal (Gaussian) distribution 32 2.3.10 Distributions derived from normal 37 2.3.11 The exponential family 41 3 Data manipulation 49 3.1 Random sampling 49 3.2 Data types 51 3.3 Data summarization 52 3.3.1 Means, medians, and central tendency 52 3.3.2 Summarizing variation 56 3.3.3 Summarizing (bivariate) correlation 59 3.4 Data diagnostics and data transformation 60 3.4.1 Outlier analysis 60 3.4.2 Entropy∗ 62 3.4.3 Data transformation 64 3.5 Simple smoothing techniques 65 3.5.1 Binning 66 3.5.2 Moving averages∗ 67 3.5.3 Exponential smoothing∗ 69 4 Data visualization and statistical graphics 76 4.1 Univariate visualization 77 4.1.1 Strip charts and dot plots 77 4.1.2 Boxplots 79 4.1.3 Stem-and-leaf plots 81 4.1.4 Histograms and density estimators 83 4.1.5 Quantile plots 87 4.2 Bivariate and multivariate visualization 89 4.2.1 Pie charts and bar charts 90 4.2.2 Multiple boxplots and QQ plots 95 4.2.3 Scatterplots and bubble plots 98 4.2.4 Heatmaps 102 4.2.5 Time series plots∗ 105 5 Statistical inference 115 5.1 Parameters and likelihood 115 5.2 Point estimation 117 5.2.1 Bias 118 5.2.2 The method of moments 118 5.2.3 Least squares/weighted least squares 119 5.2.4 Maximum likelihood∗ 120 5.3 Interval estimation 123 5.3.1 Confidence intervals 123 5.3.2 Single-sample intervals for normal (Gaussian) parameters 124 5.3.3 Two-sample intervals for normal (Gaussian) parameters 128 5.3.4 Wald intervals and likelihood intervals∗ 131 5.3.5 Delta method intervals∗ 135 5.3.6 Bootstrap intervals∗ 137 5.4 Testing hypotheses 138 5.4.1 Single-sample tests for normal (Gaussian) parameters 140 5.4.2 Two-sample tests for normal (Gaussian) parameters 142 5.4.3 Walds tests, likelihood ratio tests, and ‘exact’ tests∗ 145 5.5 Multiple inferences∗ 148 5.5.1 Bonferroni multiplicity adjustment 149 5.5.2 False discovery rate 151 Part II Statistical Learning and Data Analytics 161 6 Techniques for supervised learning: simple linear regression 163 6.1 What is “supervised learning?” 163 6.2 Simple linear regression 164 6.2.1 The simple linear model 164 6.2.2 Multiple inferences and simultaneous confidence bands 171 6.3 Regression diagnostics 175 6.4 Weighted least squares (WLS) regression 184 6.5 Correlation analysis 187 6.5.1 The correlation coefficient 187 6.5.2 Rank correlation 190 7 Techniques for supervised learning: multiple linear regression 198 7.1 Multiple linear regression 198 7.1.1 Matrix formulation 199 7.1.2 Weighted least squares for the MLR model 200 7.1.3 Inferences under the MLR model 201 7.1.4 Multicollinearity 208 7.2 Polynomial regression 210 7.3 Feature selection 211 7.3.1 R2p plots 212 7.3.2 Information criteria: AIC and BIC 215 7.3.3 Automated variable selection 216 7.4 Alternative regression methods∗ 223 7.4.1 Loess 224 7.4.2 Regularization: ridge regression 230 7.4.3 Regularization and variable selection: the Lasso 238 7.5 Qualitative predictors: ANOVA models 242 8 Supervised learning: generalized linear models 258 8.1 Extending the linear regression model 258 8.1.1 Nonnormal data and the exponential family 258 8.1.2 Link functions 259 8.2 Technical details for GLiMs∗ 259 8.2.1 Estimation 260 8.2.2 The deviance function 261 8.2.3 Residuals 262 8.2.4 Inference and model assessment 264 8.3 Selected forms of GLiMs 265 8.3.1 Logistic regression and binary-data GLiMs 265 8.3.2 Trend testing with proportion data 271 8.3.3 Contingency tables and log-linear models 273 8.3.4 Gamma regression models 281 9 Supervised learning: classification 291 9.1 Binary classification via logistic regression 292 9.1.1 Logistic discriminants 292 9.1.2 Discriminant rule accuracy 296 9.1.3 ROC curves 297 9.2 Linear discriminant analysis (LDA) 297 9.2.1 Linear discriminant functions 297 9.2.2 Bayes discriminant/classification rules 302 9.2.3 Bayesian classification with normal data 303 9.2.4 Naïve Bayes classifiers 308 9.3 k-Nearest neighbor classifiers 308 9.4 Tree-based methods 312 9.4.1 Classification trees 312 9.4.2 Pruning 314 9.4.3 Boosting 321 9.4.4 Regression trees 321 9.5 Support vector machines∗ 322 9.5.1 Separable data 322 9.5.2 Nonseparable data 325 9.5.3 Kernel transformations 326 10 Techniques for unsupervised learning: dimension reduction 341 10.1 Unsupervised versus supervised learning 341 10.2 Principal component analysis 342 10.2.1 Principal components 342 10.2.2 Implementing a PCA 344 10.3 Exploratory factor analysis 351 10.3.1 The factor analytic model 351 10.3.2 Principal factor estimation 353 10.3.3 Maximum likelihood estimation 354 10.3.4 Selecting the number of factors 355 10.3.5 Factor rotation 356 10.3.6 Implementing an EFA 357 10.4 Canonical correlation analysis∗ 361 11 Techniques for unsupervised learning: clustering and association 373 11.1 Cluster analysis 373 11.1.1 Hierarchical clustering 376 11.1.2 Partitioned clustering 384 11.2 Association rules/market basket analysis 395 11.2.1 Association rules for binary observations 396 11.2.2 Measures of rule quality 397 11.2.3 The Apriori algorithm 398 11.2.4 Statistical measures of association quality 402 A Matrix manipulation 411 A.1 Vectors and matrices 411 A.2 Matrix algebra 412 A.3 Matrix inversion 414 A.4 Quadratic forms 415 A.5 Eigenvalues and eigenvectors 415 A.6 Matrix factorizations 416 A.6.1 QR decomposition 417 A.6.2 Spectral decomposition 417 A.6.3 Matrix square root 417 A.6.4 Singular value decomposition 418 A.7 Statistics via matrix operations 419 B Brief introduction to R 421 B.1 Data entry and manipulation 422 B.2 A turbo-charged calculator 426 B.3 R functions 427 B.3.1 Inbuilt R functions 427 B.3.2 Flow control 429 B.3.3 User-defined functions 429 B.4 R packages 430 References 432 Index 453


Best Sellers


Product Details
  • ISBN-13: 9781119043171
  • Publisher: John Wiley & Sons Inc
  • Binding: Digital (delivered electronically)
  • No of Pages: 488
  • ISBN-10: 1119043174
  • Publisher Date: 09 Oct 2015
  • Language: English
  • Sub Title: Foundations for Data Mining, Informatics, and Knowledge Discovery


Similar Products

Add Photo
Add Photo

Customer Reviews

REVIEWS      0     
Click Here To Be The First to Review this Product
Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery
John Wiley & Sons Inc -
Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals

    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!