Study Questions

The following set of questions utilizes data from the Population Reference Bureau (PRB) on the relationship between women’s literacy (ages 15–24) and the prevalence of HIV in sub-Saharan Africa. Using the PRB’s Data Finder tool, data were assembled for 14 countries in sub-Saharan Africa through the year 2011 and are shown below.

Country

Female Literacy Rate

Percent Women With HIV

Angola

65

2.4

Botswana

96

29.2

Democratic Congo

78

4.1

Eritrea

84

0.9

Madagascar

68

0.1

Malawi

85

13.2

Namibia

95

15.7

Nigeria

65

4.4

Rwanda

77

3.5

Sudan

82

1.3

Swaziland

95

30.3

Tanzania

76

6.8

Togo

80

3.8

Zambia

68

16.0

 
  1. A researcher advances the hypothesis that women in countries with lower female literacy rates tend to see a greater percentage of the female population infected with HIV. What is the independent variable here? What is the dependent variable here?
  2. Several statistics were calculated from these data for both the independent and the dependent variables. Identify (i.e., name) the statistic summarizing the independent and dependent variables below.
  1. Eqn1
  2. Eqn2
  3. Eqn3
  4. Eqn4
  1. What is the direction of this relationship?
  2. Using the statistics provided in Question #3, calculate both the Y-intercept and slope coefficient for the regression equation. Once you arrive at these quantities, write down the full regression equation in proper notation and provide a one-sentence interpretation of the slope coefficient.
  3. According to the regression equation that you calculated above, what is the predicted percentage of women with HIV in a hypothetical country where the female literacy rate is equal to 0?
  4. Using the statistics provided in Question #3, calculate the value of Pearson’s correlation coefficient, r, and provide a one-sentence interpretation of this quantity. Likewise, calculate the value of the coefficient of determination, r2, and provide a one-sentence interpretation of this quantity.
  5. According to the above regression equation, what percentage of variation in the dependent variable--the percentage of females with HIV--is explained by only considering the independent variable--female literacy rates? What is the statistical term for this quantity as it was referred to in Chapter 12 of your textbook. What percentage of variation remains unexplained?
  6. The regression sum of squares reflects the improvement in the prediction error resulting from the use of the linear regression prediction equation. Calculate the value of the regression sum of squares if r2 = .09 and the sum of squares total = 19.73.
  7. For each of the 14 countries considered in this analysis, what is the predicted value for the percentage of women with HIV? Identify the five countries with the largest residuals? Pick one of these countries and interpret its residual in statistical terms.
  8. Describe what you think would happen to the value of Σe2 if the researcher added more variables.
  9. The equation, Ÿ= a + bx, is the general form for a bivariate regression equation. What would be the general form of the equation to summarize the addition of the percentage of women who work outside of the home to the original bivariate regression equation? Write this equation down and provide a statistical interpretation of each quantity in this regression equation?