Automated Essay Scoring with E-Rater V.2.0

Topics: Regression analysis, Linear regression, Essay Pages: 25 (8166 words) Published: July 22, 2012
Research Report

Automated Essay Scoring With E-rater® v.2.0

Yigal Attali Jill Burstein

Research & Development

November 2005 RR-04-45

Automated Essay Scoring With E-rater® v.2.0

Yigal Attali and Jill Burstein ETS, Princeton, NJ

November 2005

As part of its educational and social mission and in fulfilling the organization's nonprofit charter and bylaws, ETS has and continues to learn from and also to lead research that furthers educational and measurement research to advance quality and equity in education and assessment for all users of the organization's products and services. ETS Research Reports provide preliminary and limited dissemination of ETS research prior to publication. To obtain a PDF or a print copy of a report, please visit:

Copyright © 2005 by Educational Testing Service. All rights reserved. EDUCATIONAL TESTING SERVICE, E-RATER ETS, the ETS logo, and TOEFL are registered trademarks of Educational Testing Service. TEST OF ENGLISH AS A FOREIGN LANGUAGE is a trademark of Educational Testing Service. CRITERION is a service mark of Educational Testing Service. GMAT and the GRADUATE MANAGEMENT ADMISSION Test are registered trademarks of the Graduate Management Admission Council.

Automated Essay Scoring With E-rater® v.2.0

Yigal Attali and Jill Burstein ETS, Princeton, NJ

Abstract E-rater® has been used by ETS for automated essay scoring since 1999. This paper describes a new version of e-rater (v.2.0) that differs from the previous one (v.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and presents evidence on the validity and reliability of scores produced by the new version. Key words: Automated essay scoring, e-rater, CriterionSM


E-rater® has been used by ETS for automated essay scoring since February 1999. Burstein, Chodorow, and Leacock (2003) described the operational system put in use for scoring the Graduate Management Admission Test® Analytical Writing Assessment (GMAT® AWA) and for essays submitted to ETS’s writing instruction application, CriterionSM Online Essay Evaluation Service. Criterion is a Web-based service developed by ETS to evaluate a student’s writing skill and provide instantaneous score reporting and diagnostic feedback. Criterion contains two complementary applications. The scoring application uses the e-rater engine. The second application, Critique, is composed of a suite of programs that evaluates and provides feedback for errors in grammar, usage, and mechanics; identifies the essay’s discourse structure; and recognizes undesirable stylistic features. Additional details on these programs will be provided in the description of the new version of e-rater that follows. The operational system of e-rater (v.1.3) was trained on a sample of essays written on the same topic that had been scored by human readers. It measured more than 50 features in all and then computed a stepwise linear regression to select those features that made a significant contribution to the prediction of essay scores. For each essay question or prompt, the result of training was a regression equation that could be applied to the features of a new essay written to the specific topic to produce a predicted value. This value was rounded to the nearest whole number to yield the score. This paper describes a newer automated essay scoring system that will be referred to in this paper as e-rater version 2.0 (e-rater v.2.0). This new system differs from e-rater v.1.3 with regard to the feature set used in scoring, the model building approach, and the final score assignment algorithm. These differences result in an improved automated essay-scoring system. The New Feature Set The development of the new feature set used with e-rater v.2.0 was based on information extracted from e-rater v.1.3 and from the qualitative...

References: Breland, H. M., & Gaynor, J. L. (1979). A comparison of direct and indirect assessments of writing skill. Journal of Educational Measurement, 16, 119-128. Breland, H. M., Jones, R. J., & Jenkins, L. (1994). The College Board vocabulary study (College Board Rep. No. 94-4; ETS RR-94-26). New York: College Entrance Examination Board. Burstein, J., Chodorow, M., & Leacock, C. (2003, August). CriterionSM: Online essay evaluation: An application for automated evaluation of student essays. In J. Riedl & R. Hill (Eds.), Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico (pp. 3-10). Menlo Park, CA: AAAI Press. Burstein, J., Marcu, D., & Knight, K. (2003). Finding the WRITE stuff: Automatic identification of discourse structure in student essays. IEEE Intelligent Systems: Special Issue on Natural Language Processing, 18(1): 32-39. Haberman, S. (2004). Statistical and measurement properties of features used in essay assessment (ETS RR-04-21). Princeton, NJ: ETS. Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18, 613-620. Swets, J. A. (1996). Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Mahwah, NJ: Lawrence Erlbaum Associates.
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • e library Essay
  • R E V I E W S H E E T 4 Essay Example
  • Library 2.0 Essay
  • essays
  • essay
  • V for Vendetta Essay Questions
  • V for Vendetta Essay
  • A V R Essay

Become a StudyMode Member

Sign Up - It's Free