Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

The Open-MAGVIT2 project produces an open-source replication of Google’sMAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., 2^18codes), and achieves the state-of-the-art reconstruction performance onImageNet and UCF benchmarks. We also provide a tokenizer pre-trained onlarge-scale data, significantly outperforming Cosmos on zero-shot benchmarks(1.93 vs. 0.78 rFID on ImageNet original resolution). Furthermore, we exploreits application in plain auto-regressive models to validate scalabilityproperties, producing a family of auto-regressive image generation modelsranging from 300M to 1.5B. To assist auto-regressive models in predicting witha super-large vocabulary, we factorize it into two sub-vocabulary of differentsizes by asymmetric token factorization, and further introduce “next sub-tokenprediction” to enhance sub-token interaction for better generation quality. Werelease all models and codes to foster innovation and creativity in the fieldof auto-regressive visual generation.

Further reading