How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation
On this page
Conversational Recommender System (CRS) interacts with users through naturallanguage to understand their preferences and provide personalizedrecommendations in real-time. CRS has demonstrated significant potential,prompting researchers to address the development of more realistic and reliableuser simulators as a key focus. Recently, the capabilities of Large LanguageModels (LLMs) have attracted a lot of attention in various fields.Simultaneously, efforts are underway to construct user simulators based onLLMs. While these works showcase innovation, they also come with certainlimitations that require attention. In this work, we aim to analyze thelimitations of using LLMs in constructing user simulators for CRS, to guidefuture research. To achieve this goal, we conduct analytical validation on thenotable work, iEvaLM. Through multiple experiments on two widely-used datasetsin the field of conversational recommendation, we highlight several issues withthe current evaluation methods for user simulators based on LLMs: (1) Dataleakage, which occurs in conversational history and the user simulator’sreplies, results in inflated evaluation results. (2) The success of CRSrecommendations depends more on the availability and quality of conversationalhistory than on the responses from user simulators. (3) Controlling the outputof the user simulator through a single prompt template proves challenging. Toovercome these limitations, we propose SimpleUserSim, employing astraightforward strategy to guide the topic toward the target items. Our studyvalidates the ability of CRS models to utilize the interaction information,significantly improving the recommendation results.
Further reading
- Access Paper in arXiv.org