Natural Language Generation (NLG) is deemed as the task of generating text from various input including graphs, text, speech, etc.  With the breakthrough of pre-trained language model like ChatGPT , how to evaluate the quality of machine generated text has aroused the interest in the research community of artificial intelligence. Due to the huge cost of human judgement, the automatic evaluation metrics of machine translation are widely used for most text generation tasks, e.g.: graph to text generation.
In this work, you’re supposed to conduct an extensive survey of the evaluation metrics for text generation tasks, i.e.: BLEU, METEOR, ROUGE. Based on the implementation of the metrics, you have the chance to evaluate the quality of the machine generated text and analyze the reliability of the metrics. An example of the research about evaluation metrics can be found in .
• Solid programming skills (e.g. Python).
• Strong interest in reading literature in the area of natural language generation.
• Experience in pre-trained language models or HuggingFace library is a plus.