Chapter Summary

A population is any well-defined set of units of analysis.

A sample is any subset of units collected in some manner from a population. Once a sample has been collected, one can derive sample statistics that measure characteristics of the sample to estimate the value of population parameters that describe the characteristics of a population.

While a population would be the first choice for analysis, most research uses a sample because studying the entire population is not feasible.

Any difference between a population and sample is defined as bias. An unrepresentative sample will lead to inaccurate conclusions about the population.

The chapter discusses four examples of probability samples—defined as samples for which each element in the population has a known probability of inclusion in the sample. The particular population from which a sample is actually drawn is called a sampling frame, and it must be specified clearly.

These sampling methods are the first choice because they produce the most representative samples. They are probability samples.

  • In a simple random sample, each element and combination of elements in a population has an equal chance of selection.
  • A systematic sample is generated by selecting elements from a list of the population at a predetermined interval, i.e., every 50th element on the list.
  • A stratified sample is drawn from a population that has been subdivided into two or more strata based on a single characteristic, and elements are selected from each strata in proportion to each strata’s representation in the entire population. A disproportionate stratified sample can also be useful when a researcher wishes to over-represent a group that due to its small size in the population would not likely make up a large enough percentage of the sample to make quality inferences.
  • Cluster samples use groups of elements as an initial sampling frame (the 50 states in the union, for example), samples are then drawn from increasingly narrow groups (counties, then cities, then blocks) until the final sample of elements is drawn from the smallest group (individuals living in each household).

The chapter discusses four examples of nonprobability samples—defined as samples for which each element in the population has an unknown probability of inclusion in the sample. These sampling techniques, while less representative, are used to collect data when probability samples are not feasible.

  • A judgmental sample is typically used to study a diverse and usually limited number of observations rather than to analyze a sample representative of a larger target population. Observations are often hand selected.
  • A quota sample is a sample in which elements are chosen for inclusion in a nonprobabilistic manner (usually in a purposive or convenient manner) in proportion to their representation in the population.
  • A snowball sample relies on elements in the target population to identify other elements in the population for inclusion in the sample and is particularly useful when studying hard to locate or identify populations such as drug users, the homeless, or illegal immigrants.

The major goal of statistical inference is to make supportable conjectures about the unknown characteristics of a population based on sample statistics.

The expected value is the mean or average value of a sample statistic based on repeated samples from a population.

  • Sampling error is the discrepancy between an observed and a true value that arises because only a portion of a population is observed.
  • The mathematical term for the variation around the expected value is the standard error of the estimator, or standard error for short. The standard error provides a numerical indication of the variation in our sample estimates.
  • Generally, the larger the sample, the smaller the sampling error, as measured by the standard error.