Large language models, such as GPT-3 , are capable of generating natural language outputs in response to a prompt and have shown to perform well on few-shot learning tasks. One key issue which makes the use of these models impractical, however, is their size: In order to use models with hundreds of billions of parameters, specialized and expensive hardware is needed. For example, the weights of GPT-3 require hundreds of GB of GPU memory, while current high end consumer GPUs typically have a maximum 24GB of memory.
The idea of data augmentation is to artificially increase the amount of training data available for a given task by automatically creating new training examples. The larger training corpus can then be used to train a considerably smaller model for the task at hand. Using a small set of labeled examples, the goal of the proposed thesis is to use a large language model, such as GPT-NeoX-20B , to perform data augmentation for relation extraction, where the goal is to detect relations between entities (such as a person or organization) mentioned in a text.
An example of GPT-3 being used for data augmentation can be found in .
Hands-on experience in machine learning, no fear to implement neural network models (under guidance of the supervisors).