Zach’s facts

Zach’s facts have been extracted from the book to remind you of the key concepts you and Zach have learned in each chapter.

Zach's Facts 12.1 Linearity and independent errors

Linearity is the assumption that the outcome variable is, in reality, linearly related to any predictors.
Additivity is the assumption that if you have several predictors then their combined effect is best described by adding their effects together.
Violations of linearity and additivity mean that you are fitting the wrong model to your data if you use a linear model.
The assumption of independent errors means that a given error in prediction from the model should not be related to and, therefore, affected by a different error in prediction.
Violating the assumption of independence invalidates confidence intervals and significance tests of parameter estimates. The estimates themselves will be valid but not optimal if we use the method of least squares.

Zach's Facts 12.2 Homogeneity of variance/homoscedasticity

Homogeneity of variance/homoscedasticity is the assumption that the spread of outcome scores is roughly equal at different points on the predictor variable.
Homoscedasticity ensures that parameter estimates (b) using least squares methods are optimal, and that confidence intervals and p-values associated with those parameters are not biased.
Look at a plot of the standardized predicted values from your model against the standardized residuals (zpred vs. zresid). If it has a funnel shape then you have trouble.
When comparing groups, a significant Levene’s test (i.e., a p-value less than 0.05) reveals a problem with this assumption. However, there are good reasons not to use this test (Milton’s Meowsings 12.3).
The variance ratio (Hartley’s F_max) is the largest group variance divided by the smallest. This value needs to be smaller than the critical values in the additional material.

Zach's Facts 12.3 Normality

The assumption relates to the sampling distribution or distribution of residuals.
Estimates of the model parameters (the bs) will be optimal using the method of least squares if the model residuals are normal.
Confidence intervals and significance tests rely on the sampling distribution being normal, and we can assume this in large samples thanks to the central limit theorem.
Look at a histogram of residuals from your model.
A significant K-S test (i.e., a p-value less than 0.05) reveals a problem with this assumption. That is, the distribution of scores is significantly non-normal. However, there are good reasons not to use this test (Milton’s Meowsings 12.3).
In large samples don’t worry about normality too much, in smaller samples use bootstrapping to get robust estimates of b.

Zach's Facts 12.4 Other assumptions of linear models

External variables are ones that influence the outcome variable in your model but that you have not measured or included. We assume that there are no external variables that we should have included when we fit a linear model.
All predictor variables in a linear model must be quantitative or dichotomous, and the outcome variable must be quantitative, continuous and unbounded. All variables must have some variance.
Multicollinearity is when there is a strong relationship between two or more predictors in a linear model. Its presence makes parameter estimates for the model less trustworthy, limits the overall fit of the model, and makes it hard to ascertain the unique contribution of predictor variables to explaining variance in the outcome variable.
A variance inflation factor (VIF) greater than 10 indicates potential multicollinearity.

An Adventure in Statistics: The Reality Enigma