TruthfulLM: Verifying and Ensuring Truthfulness in Large Language Models
This research project focuses on improving the factual correctness of text generated by language models such as ChatGPT. The current approach used to improve the quality of generated text is reinforcement learning from human feedback (RLHF), which does not necessarily optimize for factual accuracy and indirectly addresses the issue of hallucination. The risk of solely relying on RLHF to develop better models is that it may inadvertently allow misinformation to appear legitimate rather than avoiding it. Therefore, the central objective of this project is to develop and evaluate methods that continuously check the output of language models for factual correctness and automatically correct any inaccuracies. The proposed approach builds on a previous micro-project by Aleph Alpha and KIT-AIFB that involved extracting structured information from text and comparing it with a knowledge graph to verify the accuracy of generated text. In case of hallucination, the method corrects any inaccuracies using knowledge graph-based decoding strategies. This approach can be applied to pre-trained language models without further training, which significantly increases efficiency and applicability, as training is the most energy- and cost-intensive part of model development.