Spiral of Silence: How is Large Language Model Killing Information Retrieval? – A Case Study on Open Domain Question Answering

The practice of Retrieval-Augmented Generation (RAG), which integrates LargeLanguage Models (LLMs) with retrieval systems, has become increasinglyprevalent. However, the repercussions of LLM-derived content infiltrating theweb and influencing the retrieval-generation feedback loop are largelyuncharted territories. In this study, we construct and iteratively run asimulation pipeline to deeply investigate the short-term and long-term effectsof LLM text on RAG systems. Taking the trending Open Domain Question Answering(ODQA) task as a point of entry, our findings reveal a potential digital"Spiral of Silence" effect, with LLM-generated text consistently outperforminghuman-authored content in search rankings, thereby diminishing the presence andimpact of human contributions online. This trend risks creating an imbalancedinformation ecosystem, where the unchecked proliferation of erroneousLLM-generated content may result in the marginalization of accurateinformation. We urge the academic community to take heed of this potentialissue, ensuring a diverse and authentic digital information landscape.

Further reading