Thema4983

An Extensive Survey of Automatic Evaluation Metrics for Text Generation

Informationen zur Arbeit

Abschlussarbeitstyp: Bachelor, Master
Betreuer: Shuzhou Yuan
Forschungsgruppe: Web Science

Archivierungsnummer: 4983
Abschlussarbeitsstatus: Offen
Beginn: 14. Dezember 2022
Abgabe: unbekannt

Weitere Informationen

Background

Natural Language Generation (NLG) is deemed as the task of generating text from various input including graphs, text, speech, etc. [1] With the breakthrough of pre-trained language model like ChatGPT [2], how to evaluate the quality of machine generated text has aroused the interest in the research community of artificial intelligence. Due to the huge cost of human judgement, the automatic evaluation metrics of machine translation are widely used for most text generation tasks, e.g.: graph to text generation.

Goal

In this work, you’re supposed to conduct an extensive survey of the evaluation metrics for text generation tasks, i.e.: BLEU, METEOR, ROUGE. Based on the implementation of the metrics, you have the chance to evaluate the quality of the machine generated text and analyze the reliability of the metrics. An example of the research about evaluation metrics can be found in [3].

Prerequisites

• Solid programming skills (e.g. Python).

• Strong interest in reading literature in the area of natural language generation.

• Experience in pre-trained language models or HuggingFace library is a plus.

[1] https://www.jair.org/index.php/jair/article/view/11173/26378

[2] https://openai.com/blog/chatgpt/

[2] https://arxiv.org/pdf/2107.10821.pdf