TEST of Reliability
| Application and APPROPRIATENESS
| Internal Consistency
| This measure of reliability is appropriate when trying to determine the difference in reliability from shortening or lengthening a test (Cohen & Swerdlik, 2010). Here I am specifically referring to the Spearman-Brown formula being used to determine internal consistency. A researcher could also use other measures of internal consistency meant for heterogeneous test items, such as Inter-item consistency.
| The reliability of a test increases with an increase in the number of test items. One of the strengths of the Spearman-Brown Formula is that is can determine how much more or less reliable a test is as a researcher lengthens or shortens the test. This measure can also work in reverse and tell a researcher how many items they need to add to reach a certain reliability coefficient.
| The problem with the use of the Spearman-Brown formula to determine internal consistency is that it is only affective with homogenous test items, that is items that are the same difficulty and length. Also, tests of reliability are higher for whole-test vs. half-test applications of the formula, which means that lengthier tests work better with this instrument.
| The split-half form of measuring reliability entails creating two halves in the same test that can be compared in the same manner as the parallel form of reliability testing uses. This type of measurement is appropriate when using odd-even reliability or random assignment splits, but is most applicable when designing mini-parallel forms of the same test. In this instance, each half is, “…as nearly equal as humanly possible—in format, stylistic, statistical, and related aspects” (Cohen & Swerdlik, 2010, p. 145).
| The strength of this kind of measure is that it is less time-consuming and less cumbersome for test-takers than the parallel form, but is also a good measure of internal consistency. This type of measurement also help keep in check intermediary variables that might introduce error variance into the analysis, since the both parallel portions of the test are taken at once.
| However, there are several intermediary variables that are enhanced by this form of measuring reliability: fatigue that is felt during the second part of the test but not the first and variance in the difficulty or content of the items in the first half vs. the second half. It is also not advised to simply split a test down the middle. The different halves should have the same content and difficulty of question for the measure of reliability to be accurate.
| This type of test is applicable when the construct being measured is relatively stable over time, but is inappropriate for constructs that are not stable over time (Cohen & Swerdlik, 2010). This is because test/retest reliability is based on taking the same test, with the same people, at two different times. If the construct being measured is purported to change over time, then the scores of the test would vary because of true variance, rather than error variance—which is the basis of reliability, the latter that is. An example of this principle might be an achievement test measuring grammatical skills. If the test-taker undergoes a series of lessons on grammar between the first test and the second test, then the test will show variance, but not due to error but due to the intermediary variable of education. Test/retest reliability would be inappropriate in this situation.
| The strength of this measurement of reliability are in tests that, “…employ outcome measures such as reaction time or perceptual judgment” (Cohen & Swerdlik, 2010, p. 143). This is because these types of psychometric traits do not vary greatly over time and are not sensitive to many types of intervening variable.
| The weakness of test/retest reliability is, of course, that the underlying constructs being tested can change over...
Please join StudyMode to read the full document