CRAG – Comprehensive RAG Benchmark
On this page
Retrieval-Augmented Generation (RAG) has recently emerged as a promisingsolution to alleviate Large Language Model (LLM)’s deficiency in lack ofknowledge. Existing RAG datasets, however, do not adequately represent thediverse and dynamic nature of real-world Question Answering (QA) tasks. Tobridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factualquestion answering benchmark of 4,409 question-answer pairs and mock APIs tosimulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate adiverse array of questions across five domains and eight question categories,reflecting varied entity popularity from popular to long-tail, and temporaldynamisms ranging from years to seconds. Our evaluation of this benchmarkhighlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve<=34
Further reading
- Access Paper in arXiv.org