Essentials of Social Statistics for a Diverse Society
SPSS Exercises
[GSS10SSDS]
1. The majority of variables that social scientists study are not normally distributed. This doesn’t typically cause problems in analysis when the goal of a study is to calculate means and standard deviations—as long as sample sizes are greater than about 50. (This will be discussed in later chapters.) However, when characterizing the distribution of scores in one sample, or in a complete population (if this information is available), a non-normal distribution can cause complications. We can illustrate this point by examining the distribution of age in the GSS data file.
- Create a histogram for AGE (click on Graphs, Legacy Dialogs, Histogram; insert the variable age) with a superimposed normal curve (click on the option Display Normal Curve). How does the distribution of AGE deviate from the theoretical normal curve?
- Calculate the mean and standard deviation for AGE in this sample, using either the Frequencies or Descriptives procedure.
- Assuming the distribution of AGE is normal, calculate the number of people who should be 25 years of age or less.
- Use the Frequencies procedure to construct a table of the percentage of cases at each value of AGE. Compare the theoretical calculation in (c) with the actual distribution of age in the sample. What percentage of people in the sample are 25 years old or less? Is this value close to what you calculated? Why might there be a discrepancy?
2. SPSS will calculate standard scores for any distribution. Examine the distribution of EDUC (years of school completed).
- Have SPSS calculate Z scores for EDUC. (See the SPSS Demonstration above if this is unclear.)
- What is the equivalent Z score for someone who has completed 18 years of education?
- Use the Frequencies procedure to find the percentile rank for a score of 18.
- Does the percentile rank that you found from Frequencies correspond to the Z score for a value of 18? In other words, is the distribution for years of education normal? If so, then the Z score that SPSS calculates should be very close, after transforming it into an appropriate area, to the percentile rank for that same score.
- Create histograms for EDUC and the new variable ZEDUC. Explain why they have the same shape.
3. Repeat the procedure in Problem 2, this time running separate analyses for men versus women (SEX) and blacks versus whites (RACECEN1) based on the variable EDUC. Remember, you can run separate analyses using the Data Split File command. Click on Data, Split File, Organize Output by Groups and select either SEX or RACECEN1. Is there a difference in EDUC among men/women and blacks/whites in the GSS sample? How would you describe the distribution of EDUC for the four groups?