top-rated free essay

Reliabilty and Validity

By KaySteve Nov 01, 2009 1829 Words
Test Reliability and Validity:
Evaluation of the GRADE A+ Standardized Reading Assessment

Assessment is the key to instruction and intervention, but according to Salvia, Ysseldyke and Bolt (2007), “reliability is a major consideration in evaluating an assessment procedure” (p. 119). Reliability refers to the stability of a tests’ results over time and test reliability refers to the consistency of scores students would receive on alternate forms of the same test, for example Test form A and Test form B. If a test is reliable then one would expect a student to achieve the same score regardless of when the student completes the assessment, but if it’s not reliable then a students’ score may vary based on factors that are not related to the purpose of the assessment. An assessment is considered reliable when the same results occur regardless of when the assessment occurs or who does the scoring, but a good assessment is not only reliable but minimizes as many factors as possible that could lead to the misinterpretation of the tests’ results. It is important to be concerned with a tests’ reliability for two reasons: First, reliability provides a measure of the extent to which a students’ score reflects random measurement error. If there is relatively little error, the ratio of true-score variance to obtained score variance approaches a reliability index of 1.00 (perfect reliability); if there is a relatively large amount of error, the ratio of true-score variance to obtained score variances approaches. 00 (total unreliability) (Salvia et al., 2007, p. 121) Therefore, it is warranted to use tests with good measures of reliability to ensure that the test scores reflect more than just random error. Second, reliability is a precursor to validity, which I will go more into detail about later. Validity refers to the degree to which evidence supports the fact that the test interpretations are correct and that the manner in which these interpretations are used is appropriate and meaningful. However, a formal assessment of the validity of a specific use of a test can be a very lengthy process and that is why test reliability is often viewed as the first step in the test validation process. If a test is deemed unreliable, then one need not spend time examining whether it is valid because it will not be, but if the test deems adequately reliable, then a validation study would be worthwhile. The Group Reading Assessment and Diagnostic Evaluation (GRADE) is a normative diagnostic reading assessment that determines developmentally what skills students have mastered and where they need instruction. Chapter Four of the GRADE Technical Manual focuses on three sections: reliability, validation and validity; but I will only be evaluating the first and last sections which are reliability and validity. The first section presents reliability data for the standardization sample by test at 11 levels (P, K, 1-6, M, H and A) and 14 grade enrollment groups (Preschool- 12th) to describe the consistency and stability of GRADE scores (Williams, 2001, p.77). In this section, Williams addresses Internal Reliability- which addresses consistency of the items in a test, Alternate Form Reliability- which are derived from the administration of two different but parallel test forms, Test-Retest Reliabilities- which tells how much a students score will change if a period of time has lapsed between test and Standard Error of Measurement- which represents a band of error around the true score. The GRADE Technical Manual reported 132 reliabilities in table 4.1 that presents the alpha and split half total test reliabilities for the Fall and Spring. Of these, 99 were in the range of .95 to .99; which indicates a high degree of homogeneity among the items for each form, level and grade enrollment group (Williams, 2001, p.78). In the GRADE alternate form reliability study, Table 4.14, 696 students were tested. The forms were given at different times and ranged anywhere from eight to thirty two days. The coefficients in the table ranged from .81 to .94 with half being higher than .89 indicating that Forms A and B are quite parallel (Williams, 2001, p. 85). In the GRADE test- retest reliability study, Table 4.15, 816 students were tested. All students were tested twice, the test took place during the Fall and ranged anywhere from three and a half to forty two days. Form A of the various GRADE levels appeared similar in stability over time to performance on Form B. However since most of the sampling was done with Form A, further investigation of the stability of scores with Form B may be warranted (Williams, 2001, p.87). The standard errors of measurement listed in Table 4.16 of the GRADE was computed from Table 4.1, but due to the variances in total test reliability, the SEMs ranged from low to high and due to the fact the measure of error is observable, there will always be some doubt about one’s true score. Overall it will be acceptable to assume that the reliability aspect of all levels of the GRADE Technical Manual provides a significant amount of established evidence between test forms A and B. As noted earlier, validity refers to the degree to which evidence supports the fact that the test interpretations are correct and that the manner in which these interpretations are used is appropriate and meaningful. For a test to be fair, its contents and performance expectations should reflect knowledge and experiences that are common to all students. Therefore, according to Salvia et al. (2007), “validity is the most fundamental consideration in developing and evaluating test” (p. 143). A valid assessment should reflect actual knowledge or performance, not just test taking skills or memorized equations and facts, it should not require knowledge or skills that are irrelevant to what is actually being assessed and more so, it should be as free as possible of cultural, ethnic and gender bias. The validity of an assessment is the extent to which the assessment measures what it intended or was designed to measure. The extent of a test’s validity determines (1) what inferences or decisions can be made based on test results and (2) the assurance one can have in those decisions (Williams, 2001, p.92). Validation is the process of accumulating evidence that supports the appropriateness of student responses for the specified assessment and because tests are used for various purposes, there is no single type of evidentiary validity that is apt for all purposes. Test validation can take many forms, both qualitative and quantitative, and in an assessment case such as the GRADE, can be a continuing process (Williams, 2001, p.92). As stated previously, I will be evaluating two sections from Chapter Four. Section one is complete so it brings me to the last section, which deals with validity. In this section, Williams addresses Content Validity- which addresses the question of whether the test items adequately represent the area that the test is supposed to measure, Criterion- Related Validity- which addresses the relationship between the scores on the test being validated and some form of criterion such as rating scale, classification, or other test score and Construct Validity- which addresses the question of whether the test actually measures the construct, or trait, it purports to measure. The content validity section of the GRADE Technical Manual addressed 16 subtests in various skill areas of pre-reading and reading and documents that adequate content validity was built into the reading test as it was developed. Therefore, if the appropriate decisions can be made, then the results are deemed valid and the test measures what it is suppose to measure. For the GRADE criterion-related studies, scores from other reading tests were used as the criteria and included both concurrent and predictive validity. For the concurrent validity study, the section compares the GRADE Total Test scores to three group administered test and an individual administered test. They were administered in concurrence with the Fall or Spring administering of the GRADE, with data being collected by numerous teachers throughout the U.S. and all correlations being corrected using Guilford’s formula. The three group administered test given in concurrence with the GRADE Total Test suggested they all measured what they were suppose to but the individual administered test showed evidence of discriminative and divergent validity. For the predictive validity study, the section compared how well the GRADE Total Test from the Fall predicted performance on the reading subtest of a group administered achievement test given in the Spring. Three groups totaling 260 students were given the GRADE in the Fall and the TerraNova in the Spring of the same school year, but the final samples were a little small because some of the students that tested in the Fall had moved so the scores were correlated and corrected for both assessments using Guilford’s formula. Instead of 260 there were now 232 and Table 4.22 list the corrected correlations between the GRADE and TerraNova which indicates that the GRADE scores in the Fall are predictive of the TerraNova reading scores in the Spring. The construct validity of the GRADE focuses on two aspects which are convergent validity shown by higher correlations and divergent validity shown by lower correlations. In the GRADE/PIAT-R study, shown in Table 4.21, convergent validity is demonstrated by the high correlation coefficients of the GRADE and PIAT-R reading scores and divergent validity is demonstrated by the lower correlation between the GRADE and PIAT-R general information subtest (Williams, 2001, p.97). Performances on reading tasks is represented by the first set of correlations and for the second set of correlations the GRADE represents performance on reading and the PIAT-R represents world knowledge. Convergent/divergent information was also provided for the GRADE/ITBS study shown in Table 4.23. Evidence of higher correlations for the GRADE convergent validity was provided with the ITBS reading subtest, but evidence of extensively lower correlations for the GRADE divergent validity was provided with the ITBS math subtest, which would be expected for divergent validity because reading was minimal. Overall the validity data provided a considerable amount of evidence to show that in fact the GRADE Technical Manual measures what it purports and apt conclusions from test can be correctly made. So according to my judgment in evaluating the GRADE Technical Manual in the areas of reliability (internal, alternate form, test-retest and SEM) and validity (content, criterion-related and construct), the content provided by the authors in the manual and cross referenced with the content provided in the text book denotes the manual is consistent, has acceptable correlation coefficients and measures what it is suppose to measure.

Salvia, J., Ysseldyke, J. E., & Bolt, S. (2007). Assessment In Special and Inclusive Education (10th ed.). Boston: Houghton Mifflin Company.
Williams, K.T. (2001). Technical Manual: Group Reading Assessment and Diagnostic Evaluation. Circle Pine: American Guidance Service, Inc.

Cite This Document

Related Documents

  • Reliabilty and Validity

    ... Exploring Reliability and Validity E. HackShawnee State University Type of Reliability and Validity Used Upon examination of the Values and Motives Questionnaire: The Technical Manual, it becomes clear what variables are working towards the reliability and validity of this study. The following paragraphs will entail those specific concepts...

    Read More
  • Validity

    ...Validity A really important aspect that has emerged in our project is validity which must be put into consideration when developing and evaluating a language test, since it will allow us to evaluate the utility and appropriateness of the test for our particular purpose and context. Therefore, the validity of a test can only be established thr...

    Read More

    ...Validity is about the extent to which a piece of research in finding out what the research is finding out what the researcher intents to find out. The idea of validity means that something is true and can be believed. When people say ‘that’s a valid point”, they mean that the point is relevant, meaningful and believable. It is concerned...

    Read More
  • Reliability Validity

    ...Written Reflection on Reliability and Validity July 24, 2014 Written Reflection on Reliability and Validity Educational and psychological testing assessments are one of the most important roles therapists have. The correct use and understanding of tests can provide aid, benefit, and change the life of examinees. However, incorrect use of ins...

    Read More
  • Reliability and Validity

    ...Reliability and Validity are important aspects of research in the human services field. Without reliability and validity researchers results would be useless. This paper will define the types of reliability and validity and give examples of each. Examples of a data collection method and data collection instruments used in human services and mana...

    Read More
  • Reliability and Validity

    ...Reliability and validity are essential in terms of observation and measurement as it relates to human services research. In order to ensure this particular research has legitimacy it is vital that testing and research is consistent and specific. This paper will define and describe the types of reliability and validity and provide examples of ...

    Read More
  • Validity of Quantitative Research

    ...differences between them numerically. Therefore, all crime rates are interval level measurements. The purpose of content validity is to measure the significant variables within the research. Further it can be used to quantify the adequate sampling of substance areas of questionnaires to determine content relevance. Content validity requir...

    Read More
  • Reliability and Validity

    ...Reliability and Validity Reliability and validity are important with any kind of research. Without them research and their results would be useless. This paper will define the types of reliability and validity as well as give examples of each. Both the data collection methods and the data collection instruments used in human services researc...

    Read More

Discover the Best Free Essays on StudyMode

Conquer writer's block once and for all.

High Quality Essays

Our library contains thousands of carefully selected free research papers and essays.

Popular Topics

No matter the topic you're researching, chances are we have it covered.