14.1: Understand the logic behind an ordinary least squares (OLS) regression.
14.2: Describe how to calculate a bivariate regression.
14.3: Explain how to interpret bivariate regression results and test hypotheses.
14.4: Describe why one would include multiple independent variables in a regression to control for other sources of variation.
14.5: Explain how to interpret multivariate regression results and test hypotheses.
14.6: Understand the logic behind a maximum likelihood analysis.
14.7: Explain how to interpret logistic regression results and test hypotheses.
- A regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables. Regression is to make causal assertions, rather than assertions about correlation. To make the assertions, researchers rely on regression coefficients, which are estimates of the unobserved population parameters.
- There are ten classical assumptions for linear regression models.
- Graphs provide the first step when conducting a regression analysis. A common graph that is utilized is a scatterplot. They show at a glance the form and strength of relationships.
- Establishing causal relationships is about the mean and variation from the mean.
- In a bivariate regression, researchers can plot a regression line that represents the relationship between the independent and dependent variables.
- The ordinary least squares regression formula is Y = a + bx and describes the slope of a line. Y is the dependent variable, a is the y-intercept (or constant), b is the slope, and x is the independent variable.
- If b is positive, the relationship is positive, and if b is negative, the relationship is negative.
- Regression provides the best fit line by minimizing the squared distances from each data point to the line--or minimizing the squared errors.
- In a regression analysis, the dependent variable is a continuous ratio-level variable. There are various equations that can be utilized in a regression analysis.
- A statistic related to regression, as you will see in the equation below, is Pearson’s r, the correlation coefficient. Pearson’s r indicates the level of association between two variables.
- You can further use Pearson’s r to calculate another statistic called R-Squared or R2. R-squared is a commonly reported statistic interpreted as the percentage of variation in Y that explained by the variation in the independent variable.
- Multiple regression analysis extends the bivariate regression analysis presented in Chapter 13 to include additional independent variables.
- Both types of regression involve finding an equation that best fits or approximates the data and describes the relationship between the independent and dependent variables.
- A multivariate regression coefficient is a number that tells how much Y will change for a one-unit change in a particular independent variable, if all the other variables in the model have been held constant.
- A dummy variable has two categories, generally coded 1 for the presence of a characteristic and 0 otherwise. Recoding a nominal-level variable as a dummy variable allows the variable to be used in numerical analysis.
- One can measure an interaction to determine whether variables behave differently in the presence of a third.
- There are different maximum likelihood models analysts can use to analyze dichotomous dependent variables we will only discuss results from one: logistic regression, also known as logit.
- The maximum likelihood estimation is a class of estimators that chooses a set of parameters which provides the highest probability of observing a particular outcome.
- Maximum likelihood models work differently than regression to account for the limited range in the dependent variable.
- You cannot interpret a maximum likelihood model in the same way that you interpret regression.
- A (nonlinear) logistic regression is usually a better choice for a binary dependent variable.
- A logistic regression is interpreted differently than a multiple regression. Coefficients in a logistic regression change when each independent variable is set at a different value (like the mean or one standard deviation above the mean).
- The difference between interpreting logit results and OLS regression results is that we cannot interpret the magnitude of the coefficient.
- Researchers cannot interpret the magnitude of the coefficients in maximum likelihood models, so political scientists turn to various tools for additional interpretation beyond the coefficients.
- Those tools rely on predicted probabilities.
- They can be reported in a table or used to generate various graphical representations.
- Regression is an important tool used by political science researchers and students.