Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

The robustness of recent Large Language Models (LLMs) has become increasinglycrucial as their applicability expands across various domains and real-worldapplications. Retrieval-Augmented Generation (RAG) is a promising solution foraddressing the limitations of LLMs, yet existing studies on the robustness ofRAG often overlook the interconnected relationships between RAG components orthe potential threats prevalent in real-world databases, such as minor textualerrors. In this work, we investigate two underexplored aspects when assessingthe robustness of RAG: 1) vulnerability to noisy documents through low-levelperturbations and 2) a holistic evaluation of RAG robustness. Furthermore, weintroduce a novel attack method, the Genetic Attack on RAG (GARAG),which targets these aspects. Specifically, GARAG is designed to revealvulnerabilities within each component and test the overall system functionalityagainst noisy documents. We validate RAG robustness by applying ourGARAG to standard QA datasets, incorporating diverse retrievers andLLMs. The experimental results show that GARAG consistently achieves highattack success rates. Also, it significantly devastates the performance of eachcomponent and their synergy, highlighting the substantial risk that minortextual inaccuracies pose in disrupting RAG systems in the real world.

Further reading