Dual Operating Modes of In-Context Learning
On this page
In-context learning (ICL) exhibits dual operating modes: task learning, i.e.,acquiring a new skill from in-context samples, and task retrieval, i.e.,locating and activating a relevant pretrained skill. Recent theoretical workinvestigates various mathematical models to analyze ICL, but existing modelsexplain only one operating mode at a time. We introduce a probabilistic model,with which one can explain the dual operating modes of ICL simultaneously.Focusing on in-context learning of linear functions, we extend existing modelsfor pretraining data by introducing multiple task groups and task-dependentinput distributions. We then analyze the behavior of the optimally pretrainedmodel under the squared loss, i.e., the MMSE estimator of the label givenin-context examples. Regarding pretraining task distribution as prior andin-context examples as the observation, we derive the closed-form expression ofthe task posterior distribution. With the closed-form expression, we obtain aquantitative understanding of the two operating modes of ICL. Furthermore, weshed light on an unexplained phenomenon observed in practice: under certainsettings, the ICL risk initially increases and then decreases with morein-context examples. Our model offers a plausible explanation for this “earlyascent” phenomenon: a limited number of in-context samples may lead to theretrieval of an incorrect skill, thereby increasing the risk, which willeventually diminish as task learning takes effect with more in-context samples.We also theoretically analyze ICL with biased labels, e.g., zero-shot ICL,where in-context examples are assigned random labels. Lastly, we validate ourfindings and predictions via experiments involving Transformers and largelanguage models.
Further reading
- Access Paper in arXiv.org