Fundamental Concepts in Language Testing (4) Characteristics of Language Tests: Total Test Characteristics* Hossein Farhady University for Teacher Education Iran University of Science and Technology
The first two articles in the series dealt with explaining two fundamental concepts in language testing, namely form of the tests and functions of the tests. The third paper was devoted to explaining the characteristics of an individual item. The processes of planning, preparing, reviewing, and pre-testing were discussed. In the pre-testing section of the previous article, the procedures for determining item characteristics including item facility, item discrimination, and choice distribution were also discussed. It should be clarified at the outset that if the items, which are the building blocks of a test, meet the criteria that have been introduced before, the whole test will be most likely acceptable. However, the assumption that good items will necessarily produce a good test may not always come true. Test developers should go one more step to determine the characteristics of the total test. This article, therefore, focuses on total test characteristics that include reliability, validity, and practicality.
Reliability is one of the most important characteristics of all tests in general, and language tests in particular. In fact, an unreliable test is worth nothing. In order to understand the concept of reliability, an example may prove helpful. Suppose a student took a test of grammar comprising one hundred items and received a score of 90. Suppose further that the same student took the same test two days later and got a score of 45. Finally suppose that the same student took the same test for the third time and received a score of 70. What would you think of these scores? What would you think of the student? Assuming that the student’s knowledge of English cannot go under drastic changes within this short period, the best explanation would be that there must have been something wrong with the test. How would you rely on a test that does not produce consistent scores? How can you make a sound decision on the basis of such test scores? This is the essence of the concept of reliability, i.e., producing consistent scores. Although the example mentioned above may demonstrate a very extreme case, it is not however, impossible. Reliability, then, can be technically defined as “the extent to which a test produces consistent scores at different administrations to the same or similar group of examinees”. If a test produced exactly the same scores at different administrations to the same group, that test would be perfectly reliable. This perfect reliability, nevertheless, does not practically exist in reality. There are many factors influencing test score reliability. These factors range from examinees' differing mental
and physical conditions to the precision of the test items, and to the administration as well as scoring procedures. Therefore, reliability is “the extent to which a test produces consistent scores." This means that the higher the extent, the more reliable the test. Statistically speaking, reliability is represented by the letter “r”, whose magnitude fluctuates between zero and one; zero and one demonstrate maximum and minimum degree of test score reliability. It should be mentioned that “R” is an independent statistical concept. It does not have anything to do with the content or the form of the test. It solely deals with the scores produced by a test. In fact, one can estimate “R” without having any information about the content of the test. Thus, when one talks about the reliability of a test, he refers to the scores and not to the content or the form of the test. Understanding the concept of reliability, one should next estimate “R” which requires some statistical competency. In the...