About the Book
Requiring no prior training, Modern Statistics for the Social and Behavioral Sciences provides a two-semester, graduate-level introduction to basic statistical techniques that takes into account recent advances and insights that are typically ignored in an introductory course.
Hundreds of journal articles make it clear that basic techniques, routinely taught and used, can perform poorly when dealing with skewed distributions, outliers, heteroscedasticity (unequal variances) and curvature. Methods for dealing with these concerns have been derived and can provide a deeper, more accurate and more nuanced understanding of data. A conceptual basis is provided for understanding when and why standard methods can have poor power and yield misleading measures of effect size. Modern techniques for dealing with known concerns are described and illustrated.
Features:
Presents an in-depth description of both classic and modern methods
Explains and illustrates why recent advances can provide more power and a deeper understanding of data
Provides numerous illustrations using the software R
Includes an R package with over 1300 functions
Includes a solution manual giving detailed answers to all of the exercises
This second edition describes many recent advances relevant to basic techniques. For example, a vast array of new and improved methods is now available for dealing with regression, including substantially improved ANCOVA techniques. The coverage of multiple comparison procedures has been expanded and new ANOVA techniques are described.
Rand Wilcox is a professor of psychology at the University of Southern California. He is the author of 13 other statistics books and the creator of the R package WRS. He currently serves as an associate editor for five statistics journals. He is a fellow of the Association for Psychological Science and an elected member of the International Statistical Institute.
Table of Contents:
Table of Contents
INTRODUCTION
SAMPLES VERSUS POPULATIONS
SOFTWARE
R BASICS
Entering Data
R Functions and Packages
Data Sets
Arithmetic Operations
NUMERICAL AND GRAPHICAL SUMMARIES OF DATA
BASIC SUMMATION NOTATION
MEASURES OF LOCATION
The Sample Mean
R Function Mean
The Sample Median
R Function for the Median
A CRITICISM OF THE MEDIAN: IT MIGHT TRIM TOO MANY VALUES
R Function for the Tr
R Function winmean
What is a Measure of Location?
MEASURES OF VARIATION OR SCALE
Sample Variance and Standard Deviation
R Functions var and sd
The Interquartile Range
R Functions idealf and ideafIQR
Winsorized Variance
R Function winvar
Median Absolute Deviation
R Function mad
Average Absolute Distance from the Median
Other Robust Measures of Variation
R Functions bivar, pbvar, tauvar, and tbs
DETECTING OUTLIERS
A Method Based on the Mean and Variance
A Better Outlier Detection Rule: The MAD-Median Rule
R Function out
The Boxplot
R Function boxplot
Modifications of the Boxplot Rule for Detecting Outliers
R Function outbox
Other Measures of Location
R Functions mom and onestep
HISTOGRAMS
R Functions hist and splot
KERNEL DENSITY ESTIMATORS
R Functions kdplot and akerd
STEM-AND-LEAF DISPLAYS
R Function stem
SKEWNESS
Transforming Data
CHOOSING A MEASURE OF LOCATION
EXERCISES
PROBABILITY AND RELATED CONCEPTS
BASIC PROBABILITY
EXPECTED VALUES
CONDITIONAL PROBABILITY AND INDEPENDENCE
POPULATION VARIANCE
THE BINOMIAL PROBABILITY FUNCTION
R Functions dbinom and pbinom
CONTINUOUS VARIABLES AND THE NORMAL CURVE
Computing Probabilities Associated with Normal Curves
R Function pnorm
R Function pnorm
R Function pnorm
UNDERSTANDING THE EFFECTS OF NON-NORMALITY
Skewness
PEARSON’S CORRELATION AND THE POPULATION COVARIANCE (OPTIONAL)
Computing the Population Covariance and Pearson’s Correlation
SOME RULES ABOUT EXPECTED VALUES
CHI-SQUARED DISTRIBUTIONS
EXERCISES
SAMPLING DISTRIBUTIONS AND CONFIDENCE INTERVALS
RANDOM SAMPLING
SAMPLING DISTRIBUTIONS
Sampling Distribution of the Sample Mean
Computing Probabilities Associated with the Sample Mean
A CONFIDENCE INTERVAL FOR THE POPULATION MEAN
Known Variance
Confidence Intervals When _ Is Not Known
R Functions pt and qt
Confidence Interval for the Population Mean Using Student’s t
R Function t.test
JUDGING LOCATION ESTIMATORS BASED ON THEIR SAMPLING DISTRIBUTION
Trimming and Accuracy: Another Perspective
AN APPROACH TO NON-NORMALITY: THE CENTRAL LIMIT THEOREM
STUDENT’S T AND NON-NORMALITY
CONFIDENCE INTERVALS FOR THE TRIMMED MEAN
Estimating the Standard Error of a Trimmed Mean
Function trimse
A Confidence Interval for the Population Trimmed Mean
R Function trimci
TRANSFORMING DATA
CONFIDENCE INTERVAL FOR THE POPULATION MEDIAN
R Function sint
Estimating the Standard Error of the Sample Median
R Function msmedse
More Concerns About Tied Values
A REMARK ABOUT MOM AND M-ESTIMATORS
CONFIDENCE INTERVALS FOR THE PROBABILITY OF SUCCESS
R Functions binomci, acbinomci and and binomLCO
BAYESIAN METHODS
EXERCISES
HYPOTHESIS TESTING
THE BASICS OF HYPOTHESIS TESTING
P-Value or Significance Level
Criticisms of Two-Sided Hypothesis Testing and P-Values
Summary and Generalization
POWER AND TYPE II ERRORS
Understanding How n, _, and _ Are Related to Power
TESTING HYPOTHESES ABOUT THE MEAN WHEN _ IS NOT KNOWN
R Function t.test
CONTROLLING POWER AND DETERMINING THE SAMPLE SIZE
Choosing n Prior to Collecting Data
R Function power.t.test
Stein’s Method: Judging the Sample Size When Data Are Available
R Functions stein1 and stein2
PRACTICAL PROBLEMS WITH STUDENT’S T TEST
HYPOTHESIS TESTING BASED ON A TRIMMED MEAN
R Function trimci
R Functions stein1.tr and stein2.tr
TESTING HYPOTHESES ABOUT THE POPULATION MEDIAN
R Function sintv2
MAKING DECISIONS ABOUT WHICH MEASURE OF LOCATION TO USE
BOOTSTRAP METHODS
BOOTSTRAP-T METHOD
Symmetric Confidence Intervals
Exact Nonparametric Confidence Intervals for Means Are Impossible
THE PERCENTILE BOOTSTRAP METHOD
INFERENCES ABOUT ROBUST MEASURES OF LOCATION
Using the Percentile Method
R Functions onesampb, momci and trimpb
The Bootstrap-t Method Based on Trimmed Means
R Function trimcibt
ESTIMATING POWER WHEN TESTING HYPOTHESES ABOUT A TRIMMED
MEAN
R Functions powt1est and powt1an
A BOOTSTRAP ESTIMATE OF STANDARD ERRORS
R Function bootse
EXERCISES
REGRESSION AND CORRELATION
THE LEAST SQUARES PRINCIPLE
CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
Classic Inferential Techniques
Multiple Regression
R Functions ols and lm
STANDARDIZED REGRESSION
PRACTICAL CONCERNS ABOUT LEAST SQUARES REGRESSION AND
HOW THEY MIGHT BE ADDRESSED
The Effect of Outliers on Least Squares Regression
Beware of Bad Leverage Points
Beware of Discarding Outliers Among the Y Values
Do Not Assume Homoscedasticity or that the Regression Line is
Straight
Violating Assumptions When Testing Hypotheses
Dealing with Heteroscedasticity: The HC4 Method
R Functions olshc4 and hc4test
Interval Estimation of the Mean Response
R Function olshc4band
PEARSON’S CORRELATION AND THE COEFFICIENT OF DETERMINATION
A Closer Look at Interpreting r
TESTING H0: _ = 0
R Function cor.test
R Function pwr.r.test
Testing H0: _ = 0 When There is Heteroscedasticity
R Function pcorhc4
When Is It Safe to Conclude that Two Variables Are Independent?
A REGRESSION METHOD FOR ESTIMATING THE MEDIAN OF Y AND
OTHER QUANTILES
R Function rqfit
DETECTING HETEROSCEDASTICITY
R Function khomreg
INFERENCES ABOUT PEARSON’S CORRELATION: DEALING WITH HETEROSCEDASTICITY
R Function pcorb
BOOTSTRAP METHODS FOR LEAST SQUARES REGRESSION
R Functions hc4wtest, olswbtest and lsfitci
DETECTING ASSOCIATIONS EVEN WHEN THERE IS CURVATURE
R Functions indt and medind
QUANTILE REGRESSION
R Functions qregci and rqtest
A Test for Homoscedasticity Using a Quantile Regression Approach
R Function qhomt
REGRESSION: WHICH PREDICTORS ARE BEST?
The 0.632 Bootstrap Method
R function regpre
Least Angle Regression
R Function larsR
COMPARING CORRELATIONS
R Functions TWOpov and TWOpNOV
CONCLUDING REMARKS
EXERCISES
COMPARING TWO INDEPENDENT GROUPS
STUDENT’S T TEST
Choosing the Sample Sizes
R Function power.t.test
RELATIVE MERITS OF STUDENT’S T
WELCH’S HETEROSCEDASTIC METHOD FOR MEANS
R function t.test
Tukey’s Three-Decision Rule
Non-normality and Welch’s Method
Three Modern Insights Regarding Methods for Comparing Means
METHODS FOR COMPARING MEDIANS AND TRIMMED MEANS
Yuen’s Method for Trimmed Means
R Functions yuen and fac2list
Comparing Medians
R Function msmed
PERCENTILE BOOTSTRAP METHODS FOR COMPARING MEASURES OF
LOCATION
Using Other Measures of Location
Comparing Medians
R Function medpb2
Some Guidelines on When To Use the Percentile Bootstrap Method
R Functions trimpb2, med2g and pb2gen
BOOTSTRAP-T METHODS FOR COMPARING MEASURES OF LOCATION
Comparing Means
Bootstrap-t Method When Comparing Trimmed Means
R Functions yuenbt and yhbt
Estimating Power and Judging the Sample Sizes
R Functions powest and pow2an
PERMUTATION TESTS
RANK-BASED AND NONPARAMETRIC METHODS
Wilcoxon-Mann-Whitney Test
Handling Tied Values and Heteroscedasticity
Cliff’s Method
R functions cid and cidv2
The Brunner–Munzel Method
R function bmp
The Kolmogorov–Smirnov Test
R Function ks
Comparing All Quantiles Simultaneously: An Extension of the
Kolmogorov–Smirnov Test
R Function sband
GRAPHICAL METHODS FOR COMPARING GROUPS
Error Bars
R Functions ebarplot and ebarplot.med
Plotting the Shift Function
Plotting the Distributions
R Function sumplot2g
Other Approaches
COMPARING MEASURES OF VARIATION
R Function comvar2
Brown-Forsythe Method
Comparing Robust Measures of Variation
MEASURING EFFECT SIZE
R Functions yuenv2 and akp.effect
COMPARING CORRELATIONS AND REGRESSION SLOPES
R Functions twopcor, twolsreg, and tworegwb
COMPARING TWO BINOMIALS
Storer–Kim Method
Beal’s Method
R Functions twobinom, twobici, bi2KMSv2 and power.prop.test
Comparing Two Discrete Distributions
R Function disc2com
MAKING DECISIONS ABOUT WHICH METHOD TO USE
EXERCISES
COMPARING TWO DEPENDENT GROUPS
THE PAIRED T TEST
When Does the Paired T Test Perform Well?
R Function t.test
COMPARING ROBUST MEASURES OF LOCATION
R Functions yuend, ydbt and dmedpb
Comparing Marginal M-Estimators
R Function rmmest
Measuring Effect Size
R Function D.akp.effect
HANDLING MISSING VALUES
R Functions rm2miss and rmmismcp
A DIFFERENT PERSPECTIVE WHEN USING ROBUST MEASURES OF LOCATION
R Functions loc2dif and l2drmci
THE SIGN TEST
WILCOXON SIGNED RANK TEST
R Function wilcox.test
COMPARING VARIANCES
R Function comdvar
COMPARING ROBUST MEASURES OF SCALE
R Function rmrvar
COMPARING ALL QUANTILES
R Functions lband
PLOTS FOR DEPENDENT GROUPS
R Function g2plotdifxy
EXERCISES
ONE-WAY ANOVA
ANALYSIS OF VARIANCE FOR INDEPENDENT GROUPS
A Conceptual Overview 345
ANOVA via Least Squares Regression and Dummy Coding
R Functions anova, anova1, aov, and fac2list
Controlling Power and Choosing the Sample Sizes
R Functions power.anova.test and anova.power
DEALING WITH UNEQUAL VARIANCES 356
Welch’s Test
JUDGING SAMPLE SIZES AND CONTROLLING POWER WHEN DATA ARE
AVAILABLE
R Functions bdanova1 and bdanova2
TRIMMED MEANS
R Functions t1way, t1wayv2, t1wayF and g5plot
Comparing Groups Based on Medians
R Function med1way
BOOTSTRAP METHODS
A Bootstrap-t Method
R Functions t1waybt and BFBANOVA
Two Percentile Bootstrap Methods
R Functions b1way, pbadepth and Qanova
Choosing a Method
RANDOM EFFECTS MODEL
A Measure of Effect Size
A Heteroscedastic Method
A Method Based on Trimmed Means
R Function rananova
RANK-BASED METHODS
The Kruskall-Wallis Test
R Function kruskal.test
Method BDM
R Functions bdm and bdmP
EXERCISES
TWO-WAY AND THREE-WAY DESIGNS
BASICS OF A TWO-WAY ANOVA DESIGN
Interactions
R Functions interaction.plot and interplot
Interactions When There Are More Than Two Levels
TESTING HYPOTHESES ABOUT MAIN EFFECTS AND INTERACTIONS
R function anova
Inferences About Disordinal Interactions
The Two-Way ANOVA Model
HETEROSCEDASTIC METHODS FOR TRIMMED MEANS, INCLUDING
MEANS
R Function t2way
BOOTSTRAP METHODS
R Functions pbad2way and t2waybt
TESTING HYPOTHESES BASED ON MEDIANS
R Function m2way
A RANK-BASED METHOD FOR A TWO-WAY DESIGN
R Function bdm2way
The Patel–Hoel Approach to Interactions
THREE-WAY ANOVA
R Functions anova and t3way
EXERCISES
COMPARING MORE THAN TWO DEPENDENT GROUPS
COMPARING MEANS IN A ONE-WAY DESIGN
R Function aov
COMPARING TRIMMED MEANS WHEN DEALING WITH A ONE-WAY DESIGN
R Functions rmanova and rmdat2mat
A Bootstrap-t Method for Trimmed Means
R Function rmanovab
PERCENTILE BOOTSTRAP METHODS FOR A ONE-WAY DESIGN
Method Based on Marginal Measures of Location
R Function bd1way
Inferences Based on Difference Scores
R Function rmdzero
RANK-BASED METHODS FOR A ONE-WAY DESIGN
Friedman’s Test
R Function friedman.test
Method BPRM
R Function bprm
COMMENTS ON WHICH METHOD TO USE
BETWEEN-BY-WITHIN DESIGNS
Method for Trimmed Means
R Function bwtrim and bw2list
A Bootstrap-t Method
R Function tsplitbt
Inferences Based on M-estimators and Other Robust Measures of
Location
R Functions sppba, sppbb, and sppbi
A Rank-Based Test
R Function bwrank
WITHIN-BY-WITHIN DESIGN
R Function wwtrim
THREE-WAY DESIGNS
R Functions bbwtrim, bwwtrim and wwwtrim
Data Management: R Functions bw2list and bbw2list
EXERCISES
MULTIPLE COMPARISONS
ONE-WAY ANOVA AND RELATED SITUATIONS, INDEPENDENT GROUPS
Fisher’s Least Significant Difference Method
The Tukey-Kramer Method
R Function TukeyHSD
Tukey-Kramer and the ANOVA F Test
Step-Down Methods
Dunnett’s T3
Games-Howell Method
Comparing Trimmed Means
R Functions lincon, stepmcp and twoKlin
Alternative Methods for Controlling FWE
Percentile Bootstrap Methods for Comparing Trimmed Means, Medians,
and M-estimators
R Functions medpb, tmcppb, pbmcp and p.adjust
A Bootstrap-t Method
R Function linconbt
Rank-Based Methods
R Functions cidmul, cidmulv2, and bmpmul
Comparing the Individual Probabilities of Two Discrete Distributions
R Functions binband, splotg2, cumrelf and cumrelfT
Comparing the Quantliles of Two Independent Groups
R Functions qcomhd and qcomhdMC
Multiple Comparisons for Binomial and Categorical Data
R Functions skmcp and discmcp
TWO-WAY, BETWEEN-BY-BETWEEN DESIGN
Scheff'e’s Homoscedastic Method
Heteroscedastic Methods
Extension of Welch-˘Sid'ak and Kaiser–Bowden Methods to Trimmed
Means
R Function kbcon
R Functions con2way and conCON
Linear Contrasts Based on Medians
R Functions msmed and mcp2med
Bootstrap Methods
R Functions mcp2a, and bbmcppb
The Patel-Hoel Rank-Based Interaction Method
R Function rimul
JUDGING SAMPLE SIZES
Tamhane’s Procedure
R Function tamhane
Hochberg’s Procedure
R Function hochberg
METHODS FOR DEPENDENT GROUPS
Linear Contrasts Based on Trimmed Means
R Function rmmcp
Comparing M-estimators
R Functions rmmcppb, dmedpb, dtrimpb and boxdif
Bootstrap-t Method
R Function bptd
Comparing the Quantiles of the Marginal Distributions
R Function Dqcomhd
BETWEEN-BY-WITHIN DESIGNS
R Functions bwmcp, bwamcp, bwbmcp, bwimcp, spmcpa, spmcpb,
spmcpi, and bwmcppb
WITHIN-BY-WITHIN DESIGNS
Three-Way Designs
R Functions con3way, mcp3atm, and rm3mcp
Bootstrap Methods for Three-Way Designs
R Functions bbwmcp, bwwmcp, bwwmcppb, bbbmcppb, bbwmcppb,
bwwmcppb, and wwwmcppb
EXERCISES
SOME MULTIVARIATE METHODS
LOCATION, SCATTER, AND DETECTING OUTLIERS
Detecting Outliers Via Robust Measures of Location and Scatter
R Functions cov.mve and cov.mcd
More Measures of Location and Covariance
R Functions rmba, tbs, and ogk
R Function out
A Projection-Type Outlier Detection Method
R Functions outpro, outproMC, outproad, outproadMC, and out3d
Skipped Estimators of Location
R Function smean
ONE-SAMPLE HYPOTHESIS TESTING
Comparing Dependent Groups
R Functions smeancrv2, hotel1, and rmdzeroOP
TWO-SAMPLE CASE
R Functions smean2, mat2grp, matsplit and mat2list
R functions matsplit, mat2grp and mat2list
MANOVA
R Function manova
Robust MANOVA Based on Trimmed Means
R Functions MULtr.anova and MULAOVp
A MULTIVARIATE EXTENSION OF THE WILCOXON–MANN–WHITNEY
TEST
Explanatory Measure of Effect Size: A Projection-Type Generalization
R Function mulwmwv2
RANK-BASED MULTIVARIATE METHODS
The Munzel–Brunner Method
R Function mulrank
The Choi–Marden Multivariate Rank Test
R Function cmanova
MULTIVARIATE REGRESSION
Multivariate Regression Using R
Robust Multivariate Regression
R Function mlrreg and mopreg
PRINCIPAL COMPONENTS
R Functions prcomp and regpca
Robust Principal Components 545
R Functions outpca, robpca, robpcaS, Ppca, and Ppca.summary
EXERCISES
ROBUST REGRESSION AND MEASURES OF ASSOCIATION
ROBUST REGRESSION ESTIMATORS
The Theil–Sen Estimator
R Functions tsreg, tshdreg and regplot
Least Median of Squares
Least Trimmed Squares and Least Trimmed Absolute Value Estimators
R Functions lmsreg, ltsreg, and ltareg
M-Estimators
R Function chreg
Deepest Regression Line
R Function mdepreg
Skipped Estimators
R Functions opreg and opregMC
S-estimators and an E-Type Estimator
R Function tstsreg
COMMENTS ON CHOOSING A REGRESSION ESTIMATOR
INFERENCES BASED ON ROBUST REGRESSION ESTIMATORS
Testing Hypotheses About the Slopes
Inferences About the Typical Value of Y Given X
R Functions regtest, regtestMC, regci, regciMC, regYci and regYband
Comparing Measures of Location via Dummy Coding
DEALING WITH CURVATURE: SMOOTHERS
Cleveland’s Smoother
R Functions lowess, lplot, lplot.pred and lplotCI
Smoothers Based on Robust Measures of Location
R Functions rplot, rplotCIS, rplotCI, rplotCIv2, rplotCIM, rplot.pred,
qhdsm and qhdsm.pred
Prediction When X Is Discrete: The R Function rundis
Seeing Curvature with More than Two Predictors
R Function prplot
Some Alternative Methods
Detecting Heteroscedasticity Using a Smoother
R Function rhom
SOME ROBUST CORRELATIONS AND TESTS OF INDEPENDENCE
Kendall’s tau
Spearman’s rho
Winsorized Correlation
R Function wincor
OP or Skipped Correlation
R Function scor
Inferences about Robust Correlations: Dealing with Heteroscedasticity
R Functions corb and scorci
MEASURING THE STRENGTH OF AN ASSOCIATION BASED ON A ROBUST
FIT
COMPARING THE SLOPES OF TWO INDEPENDENT GROUPS
R Function reg2ci
TESTS FOR LINEARITY
R Functions lintest, lintestMC, and linchk
IDENTIFYING THE BEST PREDICTORS
Inferences Based on Independent Variables Taken in Isolation
R Functions regpord, ts2str, and sm2strv7 585
Inferences When Independent Variables Ares Taken Together
R Function regIVcom
INTERACTIONS AND MODERATOR ANALYSES
R Functions olshc4.inter, ols.plot.inter, regci.inter, reg.plot.inter and
adtest
Graphical Methods for Assessing Interactions
R Functions kercon, runsm2g, regi
1ANCOVA
Classic ANCOVA
Robust ANCOVA Methods Based on a Parametric Regression Model
R Functions ancJN, ancJNmp, anclin, reg2plot and reg2g.p2plot
ANCOVA Based on the running-interval Smoother
R Functions ancsm, Qancsm, ancova, ancovaWMW, ancpb, ancovaUB,
ancboot, ancdet, runmean2g, qhdsm2g and l2plot
R Functions Dancts, Dancols, Dancova, Dancovapb, DancovaUB and
Dancdet
EXERCISES
BASIC METHODS FOR ANALYZING CATEGORICAL DATA
GOODNESS OF FIT
R Functions chisq.test and pwr.chisq.test
TEST OF INDEPENDENCE
R Function chi.test.ind
DETECTING DIFFERENCES IN THE MARGINAL PROBABILITIES
R Functions contab and mcnemar.test
MEASURES OF ASSOCIATION
The Proportion of Agreement
Kappa
Weighted Kappa
R Function Ckappa
LOGISTIC REGRESSION
R Functions glm and logreg
A Confidence Interval for the Odds Ratio
R Function ODDSR.CI
Smoothers for Logistic Regression
R Functions logrsm, rplot.bin, and logSM
EXERCISES
Appendix A _ ANSWERS TO SELECTED EXERCISES
Appendix B _ TABLES
Appendix C _ BASIC MATRIX ALGEBRA
Index
About the Author :
Rand Wilcox has been a Professor of Psychology at the University of Southern California since 1987. He received his Ph.D. from the University of California, Santa Barbara in 1976. His research interests are statistical methods, particularly robust methods for comparing groups and studying associations. He also collaborates with researchers in occupational therapy, gerontology, biology and psychology. He is the author of four books.