CRAG – Comprehensive RAG Benchmark

Retrieval-Augmented Generation (RAG) has recently emerged as a promisingsolution to alleviate Large Language Model (LLM)’s deficiency in lack ofknowledge. Existing RAG datasets, however, do not adequately represent thediverse and dynamic nature of real-world Question Answering (QA) tasks. Tobridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factualquestion answering benchmark of 4,409 question-answer pairs and mock APIs tosimulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate adiverse array of questions across five domains and eight question categories,reflecting varied entity popularity from popular to long-tail, and temporaldynamisms ranging from years to seconds. Our evaluation of this benchmarkhighlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve<=34

Academy

IntPDF

CRAG – Comprehensive RAG Benchmark

Further reading

CRAG – Comprehensive RAG Benchmark

Further reading#

Further reading