Learning Objectives

Learning Objectives

After completing your study of this chapter, you should be able to do the following:

Explain the importance of conducting a pilot test.
Describe how a pilot test should be set up and specify the types of information that should be collected.
Describe the collection, analyses, and interpretation of data for an item analysis, including item difficulty, item discrimination, interitem correlations, item–criterion correlations, item bias, and item characteristic curves.
Describe the collection and interpretation of data for a qualitative item analysis.
Identify and explain the criteria for retaining and dropping items to revise a test.
Describe the processes of validation and cross-validation.
Explain the concepts of differential validity, single-group validity, and unfair test discrimination.
Describe two different kinds of measurement bias.
Explain the purpose of a cut score, and describe two methods for identifying a cut score.

Chapter Summary

The pilot test is a scientific investigation of the new test’s reliability and validity for its specified purpose. Because the purpose of the pilot test is to study how well the test performs, it is important that the test is given in a situation that matches the actual circumstances in which the test will be used. Therefore, developers choose a sample of test takers who resemble or are part of the test’s target audience. When the test is given, it is important that the test administrators adhere strictly to the test procedures outlined in their test instructions. In addition, test developers or administrators may use questionnaires or interviews that gather extra information about the respondents or the test.

Each item in a test is a building block that contributes to the test’s outcome or final score. Therefore, developers examine the performance of each item to identify those items that perform well, revise those that could perform better, and eliminate those that do not yield the desired information. Developers analyze each item for its difficulty (the percentage of test takers who respond correctly), discrimination (how well it separates those who show a high degree of the construct from those who show little of the construct), correlation with other items (for reliability/precision) and with an outside criterion (for evidence of validity), and bias (whether it is easier for one group than for another group). Item characteristic curves provide pictures of each item’s difficulty and discrimination. They also can provide information about whether an item is biased against a subgroup of test takers. Test developers might also use individual or group discussions with test takers or experts to gather qualitative information about how to revise the test to improve its accuracy.

After the test has been revised, developers conduct the validation study by administering the test to another sample of people. Standards for the validation study are similar to those for designing the pilot study. The validation study provides data on the test’s reliability/precision, its correlation with any appropriate outside criteria such as performance evaluations (evidence of validity), and its correlation with other measures of the test’s construct (evidence of the construct that the test measures). The study also evaluates whether the test is equally valid for different subgroups of test takers. If the validation study provides sufficient evidence of reliability/precision and validity, the test developers conduct a final analysis called cross-validation—either via regression or via a final round of test administration to yet another sample of test takers. This second administration can be expected to yield lower validity coefficients. When resources are not available to carry out a cross-validation study, statistical estimation of the decrease in the validity coefficients is acceptable.

After validation is complete, test developers can develop norms (distributions of test scores used for interpreting an individual’s test score) and cut scores (decision points for dividing test scores into pass/ fail groupings). Their development depends on the purpose of the test and how widely it is used.

At the end of the validation process, the test manual is assembled and finalized. Contents of the manual include the rationale for constructing the test, a history of the development process, the results of the validation studies, a description of the appropriate target audience, instructions for administering and scoring the test, and information on interpreting individual scores.

Foundations of Psychological Testing: A Practical Approach

Student Resources

Learning Objectives