Preview

MyRearch

Powerful Essays
Open Document
Open Document
3542 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
MyRearch
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy

Crowd-based MT Evaluation for non-English Target Languages
Michael Paul and Eiichiro Sumita
NICT
Hikaridai 3-5
619-0289 Kyoto, Japan

Luisa Bentivogli and Marcello Federico
FBK-irst
Via Sommarive, 18
38123 Povo-Trento, Italy

.@nict.go.jp

{bentivo,federico}@fbk.eu

Abstract
This paper investigates the feasibility of using crowd-sourcing services for the human assessment of machine translation quality of translations into non-English target languages. Non-expert graders are hired through the CrowdFlower interface to Amazon’s Mechanical Turk in order to carry out a ranking-based MT evaluation of utterances taken from the travel conversation domain for 10 Indo-European and
Asian languages. The collected human assessments are analyzed for their worker characteristics, evaluation costs, and quality of the evaluations in terms of the agreement between non-expert graders and expert/oracle judgments. Moreover, data quality control mechanisms including “locale qualification” “qualificatio testing”, and “on-the-fl verification are investigated in order to increase the reliability of the crowd-based evaluation results.

1

Introduction

This paper focuses on the evaluation of machine translation (MT) quality for target languages other than English. Although human evaluation of MT output provides the most direct and reliable assessment, it is time consuming, costly, and subjective. Various automatic evaluation measures were proposed to make the evaluation of MT outputs cheaper and faster (Przybocki et al., 2008), but automatic metrics have not yet proved able to consistently predict the usefulness of MT technologies. To counter the high costs in human assessment of MT outputs, the usage of crowdsourcing services such as Amazon’s Mechanical Turk1
(MTurk) and CrowdFlower2 (CF) were proposed recently (Callison-Burch, 2009; Callison-Burch et al., 2010; Denkowski and

You May Also Find These Documents Helpful

  • Satisfactory Essays

    What are the scope and purpose of your process evaluation measure and outcome evaluation measure and how did they influence your design?…

    • 479 Words
    • 3 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Note to Faculty: To assist faculty in scoring this benchmark assessment, scoring rubrics have been provided for Weeks 2-4 individual assignments. Please submit the completed scoring instrument into the Gradebook system with the student’s assignment grade. Faculty members are not required to use this instrument, and it may be adapted.…

    • 627 Words
    • 3 Pages
    Satisfactory Essays
  • Satisfactory Essays

    •Summarize the characteristics of process and outcome evaluations. Provide two examples of each type of…

    • 1326 Words
    • 6 Pages
    Satisfactory Essays
  • Good Essays

    · The evaluation system must include measurable results for the appropriate outcome—cognitive, skill-based, affective, or return on investment—necessary to meet identified needs. Needs vary from organization to organization, so…

    • 455 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Mark 302

    • 641 Words
    • 3 Pages

    quickly categorize them as positive or negative. The findings indicate that explicit product evaluations are…

    • 641 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Developing a process evaluation measure and an outcome evaluation measure is necessary to do a proper evaluation of the…

    • 1296 Words
    • 6 Pages
    Powerful Essays
  • Good Essays

    There is a large variety of assessment methods available for assessing learners’ achievements. These include observation; questioning the learner; examining product evidence; discussion; witness testimony; looking at learner statements; recognising prior learning; simulated environment; skills tests; oral and written examinations; assignments; case studies and projects. Choosing the most appropriate assessment methods is vitally important, to help and support the learner and to ensure the job of the assessor is as straightforward, reliable and problem-free as possible.…

    • 879 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach (7th ed.). Thousand Oaks, CA: Sage.…

    • 6894 Words
    • 28 Pages
    Powerful Essays
  • Good Essays

    a) Direct observation is the best way to evaluate the assessor’s ability to carry out a fair and valid assessment. The observation should ideally take place in the work environment and involve the learner carrying out specific tasks as required for inclusion in the learner’s portfolio of evidence. The IQA can quickly establish the effect of the assessment on the learner and can get a good impression of the rapport that the assessor has with the learner and the employer. The ability of the assessor to extract the necessary information required for a suitable assessment can be evaluated. The assessor must include the details of the appeals process and should be able to link the assessment to any classroom based theory that is being delivered at the same time. A second method of observation which uses technology is the use of video evidence. If this is carried out correctly it can prove to be a valuable tool to allow for good assessor quality evaluation and training. It must be noted however that videos can be edited and therefore should not be considered a replacement for direct observation unless the situation dictates video evidence such as when a learner is working too far way or if the employer cannot allow the assessor to be there in person such as sensitive locations etc.…

    • 735 Words
    • 21 Pages
    Good Essays
  • Good Essays

    Sports Development

    • 2391 Words
    • 10 Pages

    Its assessment is based on, working towards objectives and targets and thinking of how they could achieve this, timekeeping, coaching and workmanship, quality of equipment, problem solving techniques and pride and professionalism within the industry.…

    • 2391 Words
    • 10 Pages
    Good Essays
  • Powerful Essays

    Foundations of Mythology

    • 1367 Words
    • 6 Pages

    I think that most cultures have the same beliefs about morality and that we believe a higher power exists and that beyond that the similarities end. Differences between social and cultural details are what divide’s us on virtually all…

    • 1367 Words
    • 6 Pages
    Powerful Essays
  • Better Essays

    References: 1. Kortez, Daniel. Center for the Study of Evaluation. Project 1.1 Comparative Analyses of Current Assessment and Accountability Systems/Strand 3…

    • 1343 Words
    • 6 Pages
    Better Essays
  • Good Essays

    Expert witnesses can be an important part of the assessment process. As an expert, by definition, is highly skilled in the area being assessed, they are best placed to make assessment decisions or to guide the learner to use best practice, for…

    • 4788 Words
    • 20 Pages
    Good Essays
  • Good Essays

    Taqa Unit 301

    • 790 Words
    • 4 Pages

    Assessment as a process of making judgments of learner’s knowledge, skills and competence against set criteria…

    • 790 Words
    • 4 Pages
    Good Essays
  • Good Essays

    The evaluation and control process ensures that the company is achieving the desired goals and objectives that it is set out to accomplish. In order to setup the most appropriate evaluation process, there are many guidelines for measurement that should be considered. Cost effectiveness, relevance, and validity are attributes that should be considered when measuring outputs and outcomes. A measure is cost effective is the control and information it provides outweighs the cost of producing the data (Governor’s Office of Budget and Planning and Legislative Budget Board, 1992). The measure should also be relevant in that it relates to the company’s goals, objectives and strategies. This information should be used for assessment and decision making as well. In regards to validity, the measure should collect all information that directly relates to the scope of the process or project. All other non relevant information should be filtered and…

    • 586 Words
    • 3 Pages
    Good Essays

Related Topics