CHAPTER 2: REVIEW OF RELATED LITERATURE AND STUDY
In order to evaluate syntactic variety, a parser identifies syntactic structures, such as subjunctive auxiliary verbs and a variety of clausal structures, such as complement, infinitive, and subordinate clauses. e-Graderparses the essay to identify the syntactic structures in which these terms must appear to be considered discourse markers. For example, for first to be considered a discourse marker, it cannot be a nominal modifier, as in “The first time that I saw her...” where first modifies the noun time. Instead, first must act as an adverbial conjunct, as in, “First, it has often been noted...” To capture an essay’s topical content, e-Grader uses content vector analyses that are based on the vector-space model (Salton, Wong, and Yang 1975). A set of essays that are used to train the model are converted into vectors of word frequencies. These vectors are transformed into word weights, where the weight of a word is directly proportional to its frequency in the essay but inversely related to number of essays in which it appears. To calculate the topical analysis of a novel essay, it is converted into a vector of word weights and a search is conducted to find the training vectors most similar to it. Similarity is measured by the cosine of the angle between two vectors. For one feature, topical analysis by essay, the test vector consists of all the words in the essay. The value of the feature is the mean of the scores of the most similar training vectors. The other feature, topical analysis by argument, evaluates vocabulary usage at the argument level. E-Grader uses a lexicon of cue terms and associated heuristics to automatically partition essays into component arguments or discussion points and a vector is created for each. Each argument vector is compared to the training set to assign a topical analysis score to each argument. The value for this feature is a mean of the...
Please join StudyMode to read the full document