Statistical Analyses

These pages are going to be examples and detailed treatments of specific topics (similar to the "call out" boxes in some textbooks).

The examples will mostly use data from social sciences (and psychology in particular), and they will emphasize practical issues in application. There will be a lot of "rules of thumb" that are pragmatic even when not strictly precise.

Descriptive statistics

Using the median, mean, and mode as quick diagnostics

When we first learn about measures of central tendency, such as the mean, median, and mode, it usually is in the context of the normal distribution. Most of the statistical methods we learn first in psychology also assume that we can use the mean as a starting point for describing a set of scores.

As it turns out, in many areas of psychology, most of the things that we are interested in measuring don't have a normal distribution -- not in nature, and not in our samples.

Measures of symptoms, such as the depression scale we have in the teaching data set, tend to have a mode at or near the lowest possible score, whereas the mean gets pulled much higher by the extreme high scores. This is a "positively skewed" distribution. Negative binomial. Zero-inflated. Poisson distribution (counts).

Using the range to think about whether data are importing correctly, finding keypunch errors, and deciding about the span of accurate inference.

Missing data -- notes about coding and handling missing data and differences in handling when moving between different software packages. Procedures for checking. "UseVariable" example for listwise deletion.

Exploratory data analysis

Going beyond stemleaf, box & whisker, and histogram to get a sense of distributions. Beeswarms, dotplots, jittering. Brushing.

Multivariate exploration. Scatterplots, scatterplot matrices. Playing with dot attributes (shape, size, color, fill) to map additional values. Paneling and trellis plots.

Inference

Two by two, FP (Type I) & FN (Type II) error Bias/variance in Machine Learning Multiple testing and type I error rates (Libin 3x and 10x idea) Reproducibility problems (and registration) Bonferroni and "protection" Power Effect sizes Deciding frame: Discovery vs. Confirmation, liberal vs. conservative analysis

Codebooks and meta-data

Study Scope

Slicing the salami? LPUs vs. Too much for one paper