Enhancing the efficiency of protein language models with minimal wet-lab data through few-shot learning

Accurately modeling the protein fitness landscapes holds great importance forprotein engineering. Recently, due to their capacity and representationability, pre-trained protein language models have achieved state-of-the-artperformance in predicting protein fitness without experimental data. However,their predictions are limited in accuracy as well as interpretability.Furthermore, such deep learning models require abundant labeled trainingexamples for performance improvements, posing a practical barrier. In thiswork, we introduce FSFP, a training strategy that can effectively optimizeprotein language models under extreme data scarcity. By combining thetechniques of meta-transfer learning, learning to rank, and parameter-efficientfine-tuning, FSFP can significantly boost the performance of various proteinlanguage models using merely tens of labeled single-site mutants from thetarget protein. The experiments across 87 deep mutational scanning datasetsunderscore its superiority over both unsupervised and supervised approaches,revealing its potential in facilitating AI-guided protein design.

Further reading