Chapter Summary

Chapter Objectives

14.1: Understand the logic behind an ordinary least squares (OLS) regression.
14.2: Describe how to calculate a bivariate regression.
14.3: Explain how to interpret bivariate regression results and test hypotheses.
14.4: Describe why one would include multiple independent variables in a regression to control for other sources of variation.
14.5: Explain how to interpret multivariate regression results and test hypotheses.
14.6: Understand the logic behind a maximum likelihood analysis.
14.7: Explain how to interpret logistic regression results and test hypotheses.

A regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables. Regression is to make causal assertions, rather than assertions about correlation. To make the assertions, researchers rely on regression coefficients, which are estimates of the unobserved population parameters.
There are ten classical assumptions for linear regression models.
Graphs provide the first step when conducting a regression analysis. A common graph that is utilized is a scatterplot. They show at a glance the form and strength of relationships.
Establishing causal relationships is about the mean and variation from the mean.
In a bivariate regression, researchers can plot a regression line that represents the relationship between the independent and dependent variables.
The ordinary least squares regression formula is Y = a + bx and describes the slope of a line. Y is the dependent variable, a is the y-intercept (or constant), b is the slope, and x is the independent variable.
- If b is positive, the relationship is positive, and if b is negative, the relationship is negative.
- Regression provides the best fit line by minimizing the squared distances from each data point to the line--or minimizing the squared errors.
In a regression analysis, the dependent variable is a continuous ratio-level variable. There are various equations that can be utilized in a regression analysis.
A statistic related to regression, as you will see in the equation below, is Pearson’s r, the correlation coefficient. Pearson’s r indicates the level of association between two variables.
You can further use Pearson’s r to calculate another statistic called R-Squared or R². R-squared is a commonly reported statistic interpreted as the percentage of variation in Y that explained by the variation in the independent variable.
Multiple regression analysis extends the bivariate regression analysis presented in Chapter 13 to include additional independent variables.
- Both types of regression involve finding an equation that best fits or approximates the data and describes the relationship between the independent and dependent variables.
- A multivariate regression coefficient is a number that tells how much Y will change for a one-unit change in a particular independent variable, if all the other variables in the model have been held constant.
A dummy variable has two categories, generally coded 1 for the presence of a characteristic and 0 otherwise. Recoding a nominal-level variable as a dummy variable allows the variable to be used in numerical analysis.
One can measure an interaction to determine whether variables behave differently in the presence of a third.
There are different maximum likelihood models analysts can use to analyze dichotomous dependent variables we will only discuss results from one: logistic regression, also known as logit.
- The maximum likelihood estimation is a class of estimators that chooses a set of parameters which provides the highest probability of observing a particular outcome.
- Maximum likelihood models work differently than regression to account for the limited range in the dependent variable.
- You cannot interpret a maximum likelihood model in the same way that you interpret regression.
A (nonlinear) logistic regression is usually a better choice for a binary dependent variable.
A logistic regression is interpreted differently than a multiple regression. Coefficients in a logistic regression change when each independent variable is set at a different value (like the mean or one standard deviation above the mean).
The difference between interpreting logit results and OLS regression results is that we cannot interpret the magnitude of the coefficient.
Researchers cannot interpret the magnitude of the coefficients in maximum likelihood models, so political scientists turn to various tools for additional interpretation beyond the coefficients.
- Those tools rely on predicted probabilities.
- They can be reported in a table or used to generate various graphical representations.
Regression is an important tool used by political science researchers and students.

Political Science Research Methods

Student Resources

Chapter Summary

Chapter Objectives