SPSS Exercises

Demonstration 1: Producing Scatterplots (Scatter Diagrams)
         Do people with more education work more hours per week? Some may argue that those with lower levels of education are forced to work low-paying jobs, thereby requiring them to work more hours per week to make ends meet. Others may rebut this argument by saying those with higher levels of education are in greater positions of authority, which requires more time to ensure operations run smoothly. This question can be explored with SPSS using the techniques discussed in this chapter for interval-ratio data because hours worked last week (HRS1) and number of years of education (EDUC) are both coded at an interval-ratio level in the GSS14SSDS-A file.
 
         We begin by looking at a scatterplot of these two variables. The Scatter procedure can be found under the Graphs menu choice. In the opening dialog box, click Legacy Dialogs then Scatter/Dot (which means we want to produce a standard scatterplot with two variables), select the icon for Simple Scatter, and then click Define.
 
         The Scatterplot dialog box requires that we specify a variable for both the X- and Y-axes. We place EDUC (number of years of education) on the X-axis because we consider it the independent variable and HRS1 (number of hours worked last week) on the Y-axis because it is the dependent variable. Then, click OK.
 
         You can edit it to change its appearance by double-clicking on the chart in the viewer. The action of double-clicking displays the chart in a chart window. You can edit the chart from the menus, from the toolbar, or by double-clicking on the object you want to edit.
 
It is difficult to tell whether a relationship exists just by looking at points in the scatterplot, so we will ask SPSS to include the regression line. To add a regression line to the plot, we start by double-clicking on the scatterplot to open the Chart Editor. Click Elements from the main menu, then Fit Line at Total. In the section of the dialog box headed “Fit Method,” select Linear. Click Apply and then Close. Finally, in the Chart Editor, click File and then Close. The result of these actions is shown in Figure 11.1.
 
Figure 11.1. Scatterplot of Hours Worked by Education, Regression Line Plotted
 
Chapter 11_1
         Since the regression line clearly rises as number of years of education increases, we observe the positive relationship between education and number of hours worked last week. The predicted value for those with 20 years of education is about 44 hr, compared with 39.76 hr for those with 10 years of education. However, because there is a lot of scatter around the line (the points are not close to the regression line), the predictive power of the model is weak.
 
Demonstration 2: Producing Correlation Coefficients
         To further quantify the effect of education on hours worked, we request a correlation coefficient. This statistic is available in the Bivariate procedure, which is located by clicking on Analyze, Correlate, then Bivariate (Figure 11.2). Place the variables you are interested in correlating, EDUC, and HRS1, in the Variable(s) box, then click OK.
 
Figure 11.2. Bivariate Correlations Dialog Box 
Chapter 11_2
         SPSS produces a matrix of correlations, as shown in Figure 11.3. We are interested in the correlation in the bottom left-hand cell, .084. The correlation is significant at the .05 level (two-tailed). We see that this correlation is closer to 0 than to 1, which tells us that education is not a very good predictor of hours worked, even if it is true that those with more education work more hours per week. The number under the correlation coefficient, 895, is the number of valid cases (N)--those respondents who gave a valid response to both questions. The number is reduced.
 
Figure 11.3. Correlation Matrix for Hours Worked and Education
Chapter 11_3
Demonstration 3: Producing a Regression Equation
         Next, we will use SPSS to calculate the best-fitting regression line and the coefficient of determination. This procedure is located by clicking on Analyze, Regression, then Linear. The Linear Regression dialog box (Figure 11.4) provides boxes in which to enter the dependent variable, HRS1 and the independent variable, EDUC (regression allows more than one). After you place the variables in their appropriate places, click OK to generate the output. The Linear Regression dialog box offers many other choices, but the default output from the procedure contains all that we need.
 
Figure 11.4. Linear Regression Dialog Box
 
Chapter 11_4
         SPSS produces a great deal of output, which is typical for many of the more advanced statistical procedures in the program. The output is presented in Figure 11.5. Under the Model Summary, the coefficient of determination is labeled “R square.” Its value is .007, which is very weak. Educational attainment explains little of the variation in hours worked, less than 1%.
 
Figure 11.5. Linear Regression Output Specifying the Relationship Between Education and Number of Hours Worked Last Week
Chapter 11_5
         The regression equation coefficients are presented in the Coefficients table. The regression equation coefficients are listed in the column headed “B.” The coefficient for EDUC, or b, is about .417; the intercept term, or a, identified in the “(Constant)” row, is 35.589. Thus, we would predict that every additional year of education increases the number of hours worked each week by about 25 min. Or we could predict that those with a high school level of education work, on average, 35.589 + (.417)(12) hr, or 40.59 hr. 
 
         The ANOVA table provides the results of the analysis of variance test. The table includes regression and residual sum of squares, as well as mean squares. To test the null hypothesis that r2 is zero, you will only need the statistic shown in the last column labeled “Sig.” This is the P value associated with the F ratio listed in the column head “F.” The F statistic is 6.368 and its associated P value is .012. This means that there is a little probability (.012) that r2 is really zero in the population, given the observed r2 of .007. The model, though not reducing much of the variance in predicting work hours, is significant. We are therefore able to reject the null hypothesis at the .05 level.
 
Demonstration 4: Producing a Multiple Regression Equation
         What other variables, in addition to education, affect the number of hours worked per week? One possible answer to this question is that age (AGE) has something to do with the number of hours worked per week. To answer this question, we will use SPSS to calculate a multiple regression equation and a multiple coefficient of determination. This procedure is similar to the one used to generate the bivariate regression equation. Click Analyze, Regression, then Linear. We place EDUC (number of years of education) and AGE (age in years) in the box for the independent variables and HRS1 (the number of hours worked last week) in the box for the dependent variable, and click OK. The output is presented in Figure 11.6. 
 
         Under the Model Summary, the multiple correlation coefficient labeled “R” is .109. This tells us that education and age are weakly associated with hours worked last week. The coefficient of determination is labeled “R square.” Its value is .012. An R2 of .012 means that educational attainment and age jointly explain just 1% of the variation in hours worked last week. In addition, SPSS provides an “Adjusted R square,” which is .01. The “adjusted R square” adjusts the R2 coefficient for the number of predictors in the equation. Generally, the adjusted R2 will be lower, relative to R2, the larger the number of predictors.
Figure 11.6. Multiple Regression Output Specifying the Relationship Between Education, Age, and Number of Hours Worked Last Week
Chapter 11_6
         The regression equation coefficients are listed in the Coefficients table. The regression equation coefficients are listed in the column headed “B.” The coefficient for EDUC is about .433, and for AGE, it is –.077. The intercept term, or a, identified in the “(Constant)” row, is 38.815. Thus, we would predict that, holding age constant, every additional year of education increases the number of hours worked the previous week by about 26 min (.43 × 60). 
 
SPSS Problems [GSS14SSDS-A]
         1. Explore the relationship between the number of siblings a respondent has (SIBS) and his or her number of children (CHILDS).
  1. Construct a scatterplot of these two variables in SPSS and place the best-fit linear regression line on the scatterplot. Describe the relationship between the number of siblings a respondent has (IV) and the number of his or her children (DV).
  2. Calculate the regression equation predicting CHILDS with SIBS. What are the intercept and the slope? What are the coefficient of determination and the correlation coefficient?
  3. What is the predicted number of children for someone with three siblings?
  4. What is the predicted number of children for someone without any siblings?
         2. Use the same variables as in Exercise 1 but do the analysis separately for men and women. Begin by locating the variable SEX. Click Data, Split File, and then select Organize Output by Groups. Insert SEX into the box and click OK. Now, SPSS will split your results by sex.
  1. Calculate the regression equation for men and women. (Note: You will need to scroll down through your output to find the results for men and women.) How similar are they?
  2. What is the predicted number of children for a man with six siblings? For a woman with the same number of siblings? Which group has the higher predicted number of children?
         3. Use the same variables as in Exercise 1 but do the analysis separately for White and Black respondents. Click Data, Split File, and then select Organize Output by Groups. Insert RACECEN1 into the box and click OK. SPSS will split your results by RACECEN1 (focusing your analysis only on the categories for Whites and Blacks). 
  1. Is there any difference between the regression equations for Whites and Blacks?
  2. What is the predicted number for Whites and Blacks with the same number of siblings: one sibling, four siblings, and seven siblings?
         
         4. Use the same variables as in Exercise 1 but do the analysis separately for married and divorced respondents. Begin by locating the variable MARITAL. Click Data, Split File, and then select Organize Output by Groups. Insert MARITAL into the box and click OK. SPSS will split your results by marital status.
  1. Is there any difference between the regression equations for married and divorced respondents?
  2. What is the predicted number of children for married and divorced respondents with the following number of siblings: one sibling, four siblings, and seven siblings?
  3. What differences, if any, do you find? Is the number of siblings a better predictor of number of children for married respondents or for women?
         5. Investigate the relationship between the respondent’s education (EDUC) and the education received by his or her father and mother (PAEDUC and MAEDUC, respectively).
  1. Calculate the correlation coefficient, the coefficient of determination, and the regression equation predicting the respondent’s education with father’s education only. Interpret your results.
  2. Determine the multiple correlation coefficient, the multiple coefficient of determination, and the regression equation predicting the respondent’s education with father’s and mother’s education. Interpret your results.
  3. Did taking into account the respondent’s mother’s education improve our prediction? Discuss this on the basis of the results from 5b.
  4. Using the regression equation from 5a, calculate the predicted number of years of education for a person with a father with 12 years of education. Then, repeat this procedure, adding in a mother’s 12 years of education and using the regression equation from 5b.
  5. Review the ANOVA results. Can you reject the null hypothesis that R2 = 0?