Reliabilty and Validity

  • Published : November 1, 2009
Test Reliability and Validity:
Evaluation of the GRADE A+ Standardized Reading Assessment

Assessment is the key to instruction and intervention, but according to Salvia, Ysseldyke and Bolt (2007), “reliability is a major consideration in evaluating an assessment procedure” (p. 119). Reliability refers to the stability of a tests’ results over time and test reliability refers to the consistency of scores students would receive on alternate forms of the same test, for example Test form A and Test form B. If a test is reliable then one would expect a student to achieve the same score regardless of when the student completes the assessment, but if it’s not reliable then a students’ score may vary based on factors that are not related to the purpose of the assessment. An assessment is considered reliable when the same results occur regardless of when the assessment occurs or who does the scoring, but a good assessment is not only reliable but minimizes as many factors as possible that could lead to the misinterpretation of the tests’ results. It is important to be concerned with a tests’ reliability for two reasons: First, reliability provides a measure of the extent to which a students’ score reflects random measurement error. If there is relatively little error, the ratio of true-score variance to obtained score variance approaches a reliability index of 1.00 (perfect reliability); if there is a relatively large amount of error, the ratio of true-score variance to obtained score variances approaches. 00 (total unreliability) (Salvia et al., 2007, p. 121) Therefore, it is warranted to use tests with good measures of reliability to ensure that the test scores reflect more than just random error. Second, reliability is a precursor to validity, which I will go more into detail about later. Validity refers to the degree to which evidence supports the fact that the test interpretations are correct and that the manner in which these interpretations are used is appropriate and meaningful. However, a formal assessment of the validity of a specific use of a test can be a very lengthy process and that is why test reliability is often viewed as the first step in the test validation process. If a test is deemed unreliable, then one need not spend time examining whether it is valid because it will not be, but if the test deems adequately reliable, then a validation study would be worthwhile. The Group Reading Assessment and Diagnostic Evaluation (GRADE) is a normative diagnostic reading assessment that determines developmentally what skills students have mastered and where they need instruction. Chapter Four of the GRADE Technical Manual focuses on three sections: reliability, validation and validity; but I will only be evaluating the first and last sections which are reliability and validity. The first section presents reliability data for the standardization sample by test at 11 levels (P, K, 1-6, M, H and A) and 14 grade enrollment groups (Preschool- 12th) to describe the consistency and stability of GRADE scores (Williams, 2001, p.77). In this section, Williams addresses Internal Reliability- which addresses consistency of the items in a test, Alternate Form Reliability- which are derived from the administration of two different but parallel test forms, Test-Retest Reliabilities- which tells how much a students score will change if a period of time has lapsed between test and Standard Error of Measurement- which represents a band of error around the true score. The GRADE Technical Manual reported 132 reliabilities in table 4.1 that presents the alpha and split half total test reliabilities for the Fall and Spring. Of these, 99 were in the range of .95 to .99; which indicates a high degree of homogeneity among the items for each form, level and grade enrollment group (Williams, 2001, p.78). In the GRADE alternate form reliability study, Table 4.14, 696 students were tested. The forms were given at different times and ranged...
