Calculating, Interpreting, and Reporting Cronbach’s Alpha Reliability Coefficient for Likert-Type Scales Joseph A. Gliem Rosemary R. Gliem
Abstract: The purpose of this paper is to show why single-item questions pertaining to a construct are not reliable and should not be used in drawing conclusions. By comparing the reliability of a summated, multi-item scale versus a single-item question, the authors show how unreliable a single item is; and therefore it is not appropriate to make inferences based upon the analysis of single-item questions which are used in measuring a construct.
Introduction Oftentimes information gathered in the social sciences, marketing, medicine, and business, relative to attitudes, emotions, opinions, personalities, and description’s of people’s environment involves the use of Likert-type scales. As individuals attempt to quantify constructs which are not directly measurable they oftentimes use multiple-item scales and summated ratings to quantify the construct(s) of interest. The Likert scale’s invention is attributed to Rensis Likert (1931), who described this technique for the assessment of attitudes. McIver and Carmines (1981) describe the Likert scale as follows: A set of items, composed of approximately an equal number of favorable and unfavorable statements concerning the attitude object, is given to a group of subjects. They are asked to respond to each statement in terms of their own degree of agreement or disagreement. Typically, they are instructed to select one of five responses: strongly agree, agree, undecided, disagree, or strongly disagree. The specific responses to the items are combined so that individuals with the most favorable attitudes will have the highest scores while individuals with the least favorable (or unfavorable) attitudes will have the lowest scores. While not all summated scales are created according to Likert’s specific procedures, all such scales share the basic logic associated with Likert scaling. (pp. 2223) Spector (1992) identified four characteristics that make a scale a summated rating scale as follows: First, a scale must contain multiple items. The use of summated in the name implies that multiple items will be combined or summed. Second, each individual item must measure something that has an underlying, quantitative measurement continuum. In other words, it measures a property of something that can vary quantitatively rather than qualitatively.
Refereed Paper: Gliem & Gliem An attitude, for example, can vary from being very favorable to being very unfavorable. Third, each item has no “right” answer, which makes the summated rating scale different from a multiple-choice test. Thus summated rating scales cannot be used to test for knowledge or ability. Finally, each item in a scale is a statement, and respondents are asked to give rating about each statement. This involves asking subjects to indicate which of several response choices best reflects their response to the item. (pp. 1-2) Nunnally and Bernstein (1994), McIver and Carmines (1981), and Spector (1992) discuss the reasons for using multi-item measures instead of a single item for measuring psychological attributes. They identify the following: First, individual items have considerable random measurement error, i.e. are unreliable. Nunnally and Bernstein (1994) state, “Measurement error averages out when individual scores are summed to obtain a total score” (p. 67). Second, an individual item can only categorize people into a relatively small number of groups. An individual item cannot discriminate among fine degrees of an attribute. For example, with a dichotomously scored item one can only distinguish between two levels of the attribute, i.e. they lack precision. Third, individual items lack scope. McIver and Carmines (1981) say, “It is very unlikely that a single item can fully represent...