Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

Protein diffusion models have emerged as a promising approach for proteindesign. One such pioneering model is Genie, a method that asymmetricallyrepresents protein structures during the forward and backward processes, usingsimple Gaussian noising for the former and expressive SE(3)-equivariantattention for the latter. In this work we introduce Genie 2, extending Genie tocapture a larger and more diverse protein structure space through architecturalinnovations and massive data augmentation. Genie 2 adds motif scaffoldingcapabilities via a novel multi-motif framework that designs co-occurring motifswith unspecified inter-motif positions and orientations. This makes possiblecomplex protein designs that engage multiple interaction partners and performmultiple functions. On both unconditional and conditional generation, Genie 2achieves state-of-the-art performance, outperforming all known methods on keydesign metrics including designability, diversity, and novelty. Genie 2 alsosolves more motif scaffolding problems than other methods and does so with moreunique and varied solutions. Taken together, these advances set a new standardfor structure-based protein design. Genie 2 inference and training code, aswell as model weights, are freely available at:https://github.com/aqlaboratory/genie2.

Further reading