Chapter Summary

How researchers measure concepts can significantly affect findings, even leading to entirely different conclusions.

Operationalization is deciding how to record empirical observations of the occurrence of an attribute or a behavior using numerals or scores. By specifying the operational definition of a concept, its precise meaning becomes clear and the researcher explicitly states the terms by which the concept will be measured.

The quality of measurements is judged in regard to both their accuracy and their precision.

Measurements may be inaccurate because they are unreliable or invalid.

Reliability is the extent to which an experiment, test, or any measuring procedure yields the same results on repeated trials. The chapter discusses three tests of reliability: the test-retest, alternative form, and split-halves methods.
- Each test requires the researcher to compare the results of two or more tests for consistency in the answers—a more reliable measure will have more consistency.

A valid measure is one that measures what it is supposed to measure—in other words, the degree of correspondence between the measure and the concept it is thought to measure. The chapter discusses six tests of validity, but validity is not as easy to demonstrate as reliability: face validity, content validity, construct validity, convergent construct validity, discriminant construct validity, and interitem association.

Each test confirms validity through comparison with the operational definition of a concept or with other measures of the concept of interest or related concepts. Measures that capture the full definition of the concept or produce results consistent with other measures are considered valid.

The reliability and validity of the measures used by political scientists are seldom demonstrated to everyone’s satisfaction—most measures are partially accurate.

The precision of a measure is captured by its level of measurement—the type of information a measure contains and the type of comparison or analysis that can be used with the measure across observations.

There are four levels of measurement ranging from the nominal level that contains the least amount of information and lends itself to the fewest analytical tools to the ratio level that contains the most information and lends itself to the most analytical tools:

The nominal level describes variables that indicate only a difference between categories.
At the ordinal level, categories may be ranked in order in addition to indicating a difference between categories.
The interval level includes all of the information of the preceding levels and adds meaningful intervals between values of the variable.
The ratio level adds a meaningful zero to the interval level. Ratio level variables hold the full properties of mathematics and can be used with most analytical tools.

Some concepts, such as age, can be satisfactorily captured with a single question. More complex concepts such as ideology may require a multi-item measure consisting of several questions that capture different components of the concept and therefore increase validity. The chapter discusses four forms of multi-item measures.

Researchers may construct a summation index by combining the scores on multiple questions to create one single measure of a concept.
A Likert scale uses only select questions from an index that differentiate between respondents, such as liberals and conservatives, to create a single score for each respondent.
A Guttman scale has answer choices arranged in an ordinal manner where respondents will agree with each of the lower ranked answers if they agree with a higher ranked answer.
Mokken scaling also analyzes responses to multiple items by respondents to see if, for each item, respondents can be ordered and if items can be ordered.
Factor analysis allows researchers to uncover patterns across related measures to create summary variables that represent different dimensions of the same concept.

Political Science Research Methods

Student Resources

Chapter Summary