Stage-oe-small.jpg

Thema4983: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
Zeile 12: Zeile 12:
 
=== Goal ===
 
=== Goal ===
  
In this work, you’re supposed to conduct an extensive survey of the evaluation met-rics for text generation tasks, i.e.: BLEU, METEOR, ROUGE. Based on the implemen-tation of the metrics, you have the chance to evaluate the quality of the machine gen-erated text and analyze the reliability of the metrics.   
+
In this work, you’re supposed to conduct an extensive survey of the evaluation metrics for text generation tasks, i.e.: BLEU, METEOR, ROUGE. Based on the implementation of the metrics, you have the chance to evaluate the quality of the machine generated text and analyze the reliability of the metrics.   
 
An example of the research about evaluation metrics can be found in [3].
 
An example of the research about evaluation metrics can be found in [3].
  

Version vom 14. Dezember 2022, 11:34 Uhr



An Extensive Survey of Automatic Evaluation Metrics for Text generation




Informationen zur Arbeit

Abschlussarbeitstyp: Bachelor, Master
Betreuer: Shuzhou Yuan
Forschungsgruppe: Web Science

Archivierungsnummer: 4983
Abschlussarbeitsstatus: Offen
Beginn: 14. Dezember 2022
Abgabe: unbekannt

Weitere Informationen

Background

Natural Language Generation (NLG) is deemed as the task of generating text from various input including graphs, text, speech, etc. [1] With the breakthrough of pre-trained language model like ChatGPT [2], how to evaluate the quality of machine generated text has aroused the interest in the research community of artificial intelligence. Due to the huge cost of human judgement, the automatic evaluation metrics of machine translation are widely used for most text generation tasks, e.g.: graph to text generation.

Goal

In this work, you’re supposed to conduct an extensive survey of the evaluation metrics for text generation tasks, i.e.: BLEU, METEOR, ROUGE. Based on the implementation of the metrics, you have the chance to evaluate the quality of the machine generated text and analyze the reliability of the metrics. An example of the research about evaluation metrics can be found in [3].


Prerequisites

• Solid programming skills (e.g. Python).

• Strong interest in natural language processing, especially nature language generation.

• Experience in pre-trained language models or HuggingFace library is a plus.


[1] https://www.jair.org/index.php/jair/article/view/11173/26378

[2] https://openai.com/blog/chatgpt/

[2] https://arxiv.org/pdf/2107.10821.pdf