To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering

Medical open-domain question answering demands substantial access tospecialized knowledge. Recent efforts have sought to decouple knowledge frommodel parameters, counteracting architectural scaling and allowing for trainingon common low-resource hardware. The retrieve-then-read paradigm has becomeubiquitous, with model predictions grounded on relevant knowledge pieces fromexternal repositories such as PubMed, textbooks, and UMLS. An alternative path,still under-explored but made possible by the advent of domain-specific largelanguage models, entails constructing artificial contexts through prompting. Asa result, “to generate or to retrieve” is the modern equivalent of Hamlet’sdilemma. This paper presents MedGENIE, the first generate-then-read frameworkfor multiple-choice question answering in medicine. We conduct extensiveexperiments on MedQA-USMLE, MedMCQA, and MMLU, incorporating a practicalperspective by assuming a maximum of 24GB VRAM. MedGENIE sets a newstate-of-the-art in the open-book setting of each testbed, allowing asmall-scale reader to outcompete zero-shot closed-book 175B baselines whileusing up to 706× fewer parameters. Our findings reveal that generatedpassages are more effective than retrieved ones in attaining higher accuracy.

Further reading