In order for assessments to be sound, they must be free of bias and distortion. Reliability and validity are two concepts that are important for defining and measuring bias and distortion. Reliability refers to the extent to which assessments are consistent. Just as we enjoy having reliable cars (cars that start every time we need them), we strive to have reliable, consistent instruments to measure student achievement. Another way to think of reliability is to imagine a kitchen scale. If you weigh five pounds of potatoes in the morning, and the scale is reliable, the same scale should register five pounds for the potatoes an hour later (unless, of course, you peeled and cooked them). Likewise, instruments such as classroom tests and national standardized exams should be reliable – it should not make any difference whether a student takes the assessment in the morning or afternoon; one day or the next. Another measure of reliability is the internal consistency of the items. For example, if you create a quiz to measure students’ ability to solve quadratic equations, you should be able to assume that if a student gets an item correct, he or she will also get other, similar items correct. The following table outlines three common reliability measures. | | | |
| | |
Type of Reliability| How to Measure|
Stability or Test-Retest| Give the same assessment twice, separated by days, weeks, or months. Reliability is stated as the correlation between scores at Time 1 and Time 2.| Alternate Form| Create two forms of the same test (vary the items slightly). Reliability is stated as correlation between scores of Test 1 and Test 2.| Internal Consistency (Alpha, a)| Compare one half of the test to the other half. Or, use methods such as Kuder-Richardson Formula 20 (KR20) or Cronbach's Alpha. |
| | |
reliability (systemic def.) is the ability of a person or system to perform and maintain its functions in routine...