Chapter Summary

This chapter presents methods of statistical analysis of relationships between two variables. The relationship between variables lies at the heart of empirical social science inquiry.

Two variables that covary are said to be related, but not all variables that covary represent a causal relationship. You can describe the relationship between two variables in terms of direction (positive or negative), form (general, linear, nonlinear, or monotonic), and strength.

One of the best places to start when assessing relationships is looking at a graph like a scatter plot or a matrix plot that can graphically reveal the direction, form, and strength of a relationship.

Measures of association indicate the extent to which two variables are related. There are many measures of association, and each is intended for use with nominal, ordinal, interval, or ratio level variables, but usually not all levels of measurement.

  • Cross tabulations are for use with nominal and ordinal level variables.
  • Proportional-reduction-in-error interpretation (lambda) is for use when at least one variable in a relationship is measured at the nominal level.
  • Kendall’s tau b, Kendall’s tau c, Somer’s d, and Goodman and Kruskal’s gamma are for use with ordinal level variables. These statistics rely on contingency tables with concordant and discordant pairs and ties.
  • Analysis of variance (ANOVA) is appropriate when one variable is nominal or ordinal and the other is interval or ratio.
    • You should use an F-test to determine statistical significance with ANOVA.

Difference of means and difference of proportions tests are appropriate when one of the variables is interval or ratio.

  • The chapter includes an example of a z-test and a t-test. The z-test uses the normal distribution table (or z-table), and the t-test uses the student’s t-distribution table (t-table) to establish statistical significance.
  • Confidence intervals can be used as well.

Pearson’s r is a measure of the strength and direction of the linear correlation between two interval or ratio level variables.

Many measures of association are bound between the values of positive one (perfect positive association) and negative one (perfect negative association) with a zero in the middle indicating no relationship. But regardless of the upper and lower limits, the size of the coefficient indicates the strength of the relationship.

Two variables are statistically independent if and only if the chances of observing a combination of categories are equal to the marginal probability of one category times the marginal probability of the other.

  • Statistical independence, and the statistical significance of a relationship, can be tested by using a chi-square test.

Regression analysis is a method for describing how, how strongly, and under what conditions an independent and dependent variable are associated. And can be used to make causal inferences.

The ordinary least squares regression formula is Y=a+bx and describes the slope of a line. Y is the dependent variable, a is the y-intercept (or constant), b is the slope, and x is the independent variable.

  • If b is positive the relationship is positive, and if b is negative the relationship is negative.
  • Regression provides the best fit line by minimizing the squared distances from each data point to the line—or minimizing the squared errors.

You can interpret regression results as a one unit increase in the independent variable causing a b unit increase (or decrease) in the dependent variable. R squared indicates the fit between the independent and dependent variables.