VALIDITY AND RELIABILITY
For the statistical consultant working with social science researchers the estimation of reliability and validity is a task frequently encountered. Measurement issues differ in the social sciences in that they are related to the quantification of abstract, intangible and unobservable constructs. In many instances, then, the meaning of quantities is only inferred. Let us begin by a general description of the paradigm that we are dealing with. Most concepts in the behavioral sciences have meaning within the context of the theory that they are a part of. Each concept, thus, has an operational definition which is governed by the overarching theory. If a concept is involved in the testing of hypothesis to support the theory it has to be measured. So the first decision that the research is faced with is “how shall the concept be measured?” That is the type of measure. At a very broad level the type of measure can be observational, self-report, interview, etc. These types ultimately take shape of a more specific form like observation of ongoing activity, observing video-taped events, self-report measures like questionnaires that can be open-ended or close-ended, Likert-type scales, interviews that are structured, semi-structured or unstructured and open-ended or close-ended. Needless to say, each type of measure has specific types of issues that need to be addressed to make the measurement meaningful, accurate, and efficient. Another important feature is the population for which the measure is intended. This decision is not entirely dependent on the theoretical paradigm but more to the immediate research question at hand.
A third point that needs mentioning is the purpose of the scale or measure. What is it that the researcher wants to do with the measure? Is it developed for a specific study or is it developed with the anticipation of extensive use with similar populations? Once some of these decisions are made and a measure is developed, which is a careful and tedious process, the relevant questions to raise are “how do we know that we are indeed measuring what we want to measure?” since the construct that we are measuring is abstract, and “can we be sure that if we repeated the measurement we will get the same result?”. The first question is related to validity and second to reliability. Validity and reliability are two important characteristics of behavioral measure and are referred to as psychometric properties. It is important to bear in mind that validity and reliability are not an all or none issue but a matter of degree. Validity:
Very simply, validity is the extent to which a test measures what it is supposed to measure. The question of validity is raised in the context of the three points made above, the form of the test, the purpose of the test and the population for whom it is intended. Therefore, we cannot ask the general question “Is this a valid test?”. The question to ask is “how valid is this test for the decision that I need to make?” or “how valid is the interpretation I propose for the test?” We can divide the types of validity into logical and empirical. Content Validity:
When we want to find out if the entire content of the behavior/construct/area is represented in the test we compare the test task with the content of the behavior. This is a logical method, not an empirical one. Example, if we want to test knowledge on American Geography it is not fair to have most questions limited to the geography of New England. Face Validity:
Basically face validity refers to the degree to which a test appears to measure what it purports to measure. Criterion-Oriented or Predictive Validity:
When you are expecting a future performance based on the scores obtained currently by the measure, correlate the scores obtained with the performance. The later performance is called the criterion and the current score is the prediction. This is an empirical check on the value...
References: Berk, R., 1979. Generalizability of Behavioral Observations: A Clarification of Interobserver Agreement and Interobserver Reliability. American Journal of Mental Deficiency, Vol. 83, No. 5, p. 460-472.
Cronbach, L., 1990. Essentials of psychological testing. Harper & Row, New York.
Carmines, E., and Zeller, R., 1979. Reliability and Validity Assessment. Sage Publications, Beverly Hills, California.
Gay, L., 1987. Eductional research: competencies for analysis and application. Merrill Pub. Co., Columbus.
Guilford, J., 1954. Psychometric Methods. McGraw-Hill, New York.
Nunnally, J., 1978. Psychometric Theory. McGraw-Hill, New York.
Winer, B., Brown, D., and Michels, K., 1991. Statistical Principles in Experimental Design, Third Edition. McGraw-Hill, New York.
Please join StudyMode to read the full document