11.1: Understand how to compile a data matrix and summarize large batches of data.
11.2: Describe data with measures of central tendency and dispersion.
11.3: Explain how to graph data for presentation and exploration.
11.4: Describe how the early steps in making sense of data lay the groundwork for statistical inference.
- This chapter introduces the first step in applied statistics: data exploration.
- Two simple ways to explore data are in a data matrix, an array of rows and columns that stores observed values of variable, and an empirical frequency distribution, a table that shows the number of observations having each value of a variable. From a frequency distribution, one can calculate relative frequency and cumulative proportions.
- The chapter discusses two categories of descriptive statistic: measures of central tendency and measures of variability and dispersion. Measures of central tendency describe the typical case in the data, while measures of dispersion describe how the rest of the data are distributed around the typical case.
- Measures of central tendency include the mean (average value), trimmed mean (with a certain number of observations excluded at the high and low ends of the observations), median (middle value when all cases are rank ordered), and mode (most commonly observed value).
- One important concern when measuring central tendency is the presence of outliers. An outlier is a value that is far greater or smaller than the other values on a recorded variable. Both the mode and median are known as resistant measures, because they are not sensitive to one or a few extremes values.
- Dispersion refers to the differences among the units of a variable. Measures of dispersion include range (distance between minimum and maximum values), interquartile range (the range using the third and first quartiles in place of the maximum and minimum), variance (average squared distance between each value and the mean), and standard deviation (square root of the variance). A normal distribution refers to a distribution defined by a mathematical formula and the graph of which has a symmetrical, bell shape.
- Graphs, charts, and diagrams are to be used to explore and present the distribution of data. The chapter includes examples of a bar chart, dot chart, histogram, boxplot, time-series plot and explains the circumstances under which each might be best.
- Graphs, charts, and diagrams can be used to relay information about data including but not limited to central tendency, dispersion, the shape of a distribution, outliers, and relationships.