Personalized Language Modeling from Personalized Human Feedback
On this page
Personalized large language models (LLMs) are designed to tailor responses toindividual user preferences. While Reinforcement Learning from Human Feedback(RLHF) is a commonly used framework for aligning LLMs with human preferences,vanilla RLHF assumes that all human preferences share the same distribution,preventing fine-tuned LLMs from generating personalized content when userpreferences are diverse. In this work, we propose Personalized-RLHF (P-RLHF),an efficient framework that utilizes a lightweight user model to captureindividual user preferences and jointly learns the user model and thepersonalized LLM from human feedback. P-RLHF exhibits the following threecharacteristics: (1) It enables an LLM to generate personalized content andscale efficiently with growing number of users. (2) It handles both explicituser preferences described as textual input and implicit user preferencesencoded in the feedback data. (3) It eliminates the need for users to fullyarticulate their preferences, which are normally needed for prompting LLMs togenerate personalized content yet are often impractical to obtain in real-worldscenarios. Our experimental results show that personalized LLMs trained usingP-RLHF generate responses that are more closely aligned with individual userpreferences, outperforming vanilla, non-personalized RLHF and prompting-basedpersonalization approaches across different tasks. We opensource our code athttps://github.com/HumainLab/Personalized_RLHF.
Further reading
- Access Paper in arXiv.org