Chapter Summary

Chapter Objectives

14.1: Understand the logic behind an ordinary least squares (OLS) regression.
14.2: Describe how to calculate a bivariate regression.
14.3: Explain how to interpret bivariate regression results and test hypotheses.
14.4: Describe why one would include multiple independent variables in a regression to control for other sources of variation.
14.5: Explain how to interpret multivariate regression results and test hypotheses.
14.6: Understand the logic behind a maximum likelihood analysis.
14.7: Explain how to interpret logistic regression results and test hypotheses. 

  • A regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables. Regression is to make causal assertions, rather than assertions about correlation. To make the assertions, researchers rely on regression coefficients, which are estimates of the unobserved population parameters.
  • There are ten classical assumptions for linear regression models.
  • Graphs provide the first step when conducting a regression analysis. A common graph that is utilized is a scatterplot. They show at a glance the form and strength of relationships.
  • Establishing causal relationships is about the mean and variation from the mean.
  • In a bivariate regression, researchers can plot a regression line that represents the relationship between the independent and dependent variables.
  • The ordinary least squares regression formula is bx and describes the slope of a line. Y is the dependent variable, a is the y-intercept (or constant), b is the slope, and x is the independent variable.
    • If b is positive, the relationship is positive, and if b is negative, the relationship is negative.
    • Regression provides the best fit line by minimizing the squared distances from each data point to the line--or minimizing the squared errors.
  • In a regression analysis, the dependent variable is a continuous ratio-level variable. There are various equations that can be utilized in a regression analysis.
  • A statistic related to regression, as you will see in the equation below, is Pearson’s r, the correlation coefficient. Pearson’s r indicates the level of association between two variables.
  • You can further use Pearson’s r to calculate another statistic called R-Squared or R2R-squared is a commonly reported statistic interpreted as the percentage of variation in Y that explained by the variation in the independent variable.
  • Multiple regression analysis extends the bivariate regression analysis presented in Chapter 13 to include additional independent variables.
    • Both types of regression involve finding an equation that best fits or approximates the data and describes the relationship between the independent and dependent variables.
    • A multivariate regression coefficient is a number that tells how much Y will change for a one-unit change in a particular independent variable, if all the other variables in the model have been held constant.
  • A dummy variable has two categories, generally coded 1 for the presence of a characteristic and 0 otherwise. Recoding a nominal-level variable as a dummy variable allows the variable to be used in numerical analysis.
  • One can measure an interaction to determine whether variables behave differently in the presence of a third.
  • There are different maximum likelihood models analysts can use to analyze dichotomous dependent variables we will only discuss results from one: logistic regression, also known as logit.
    • The maximum likelihood estimation is a class of estimators that chooses a set of parameters which provides the highest probability of observing a particular outcome.
    • Maximum likelihood models work differently than regression to account for the limited range in the dependent variable.
    • You cannot interpret a maximum likelihood model in the same way that you interpret regression.
  • A (nonlinear) logistic regression is usually a better choice for a binary dependent variable.
  • A logistic regression is interpreted differently than a multiple regression. Coefficients in a logistic regression change when each independent variable is set at a different value (like the mean or one standard deviation above the mean).
  • The difference between interpreting logit results and OLS regression results is that we cannot interpret the magnitude of the coefficient.
  • Researchers cannot interpret the magnitude of the coefficients in maximum likelihood models, so political scientists turn to various tools for additional interpretation beyond the coefficients.
    • Those tools rely on predicted probabilities.
    • They can be reported in a table or used to generate various graphical representations.
  • Regression is an important tool used by political science researchers and students.