Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction

Document-level Relation Triplet Extraction (DocRTE) is a fundamental task ininformation systems that aims to simultaneously extract entities with semanticrelations from a document. Existing methods heavily rely on a substantialamount of fully labeled data. However, collecting and annotating data for newlyemerging relations is time-consuming and labor-intensive. Recent advanced LargeLanguage Models (LLMs), such as ChatGPT and LLaMA, exhibit impressive long-textgeneration capabilities, inspiring us to explore an alternative approach forobtaining auto-labeled documents with new relations. In this paper, we proposea Zero-shot Document-level Relation Triplet Extraction (ZeroDocRTE) framework,which generates labeled data by retrieval and denoising knowledge from LLMs,called GenRDK. Specifically, we propose a chain-of-retrieval prompt to guideChatGPT to generate labeled long-text data step by step. To improve the qualityof synthetic data, we propose a denoising strategy based on the consistency ofcross-document knowledge. Leveraging our denoised synthetic data, we proceed tofine-tune the LLaMA2-13B-Chat for extracting document-level relation triplets.We perform experiments for both zero-shot document-level relation and tripletextraction on two public datasets. The experimental results illustrate that ourGenRDK framework outperforms strong baselines.

Further reading