Chapter Summary

This chapter introduces the first step in applied statistics: data exploration.

Two simple ways to explore data are in a data matrix, an array of rows and columns that stores observed values of variable, and an empirical frequency distribution, a table that shows the number of observations having each value of a variable. From a frequency distribution one can calculate relative frequency and cumulative proportions.

The chapter discusses two categories of descriptive statistic: measures of central tendency and measures of variability and dispersion. Measures of central tendency describe the typical case in the data while measures of dispersion describe how the rest of the data is distributed around the typical case.

Measures of central tendency include the mean (average value), trimmed mean (with a certain number of observations excluded at the high and low ends of the observations), median (middle value when all cases are rank ordered), and mode (most commonly observed value).

Measures of dispersion include range (distance between minimum and maximum values), interquartile range (the range using the third and first quartiles in place of the maximum and minimum), variance (average squared distance between each value and the mean), and standard deviation (square root of the variance).

Outliers, or values that are far larger or smaller than the rest of the values in the data, affect some measures of central tendency and dispersion but not others—so some statistics are better choices than others depending on the nature of the data. Also, some statistics are better than others with certain levels of measurement.

Graphs, charts, and diagrams are to be used to explore and present the distribution of data. The chapter includes examples of a bar chart, dot chart, histogram, boxplot, time series plot, and explains the circumstances under which each might be best.

Graphs, charts and diagrams can be used to relay information about data including but not limited to central tendency, dispersion, the shape of a distribution, outliers and relationships.