MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
On this page
Recent advances in Text-to-Video generation (T2V) have achieved remarkablesuccess in synthesizing high-quality general videos from textual descriptions.A largely overlooked problem in T2V is that existing models have not adequatelyencoded physical knowledge of the real world, thus generated videos tend tohave limited motion and poor variations. In this paper, we proposeMagicTime, a metamorphic time-lapse video generation model, whichlearns real-world physics knowledge from time-lapse videos and implementsmetamorphic generation. First, we design a MagicAdapter scheme to decouplespatial and temporal training, encode more physical knowledge from metamorphicvideos, and transform pre-trained T2V models to generate metamorphic videos.Second, we introduce a Dynamic Frames Extraction strategy to adapt tometamorphic time-lapse videos, which have a wider variation range and coverdramatic object metamorphic processes, thus embodying more physical knowledgethan general videos. Finally, we introduce a Magic Text-Encoder to improve theunderstanding of metamorphic video prompts. Furthermore, we create a time-lapsevideo-text dataset called ChronoMagic, specifically curated to unlockthe metamorphic video generation ability. Extensive experiments demonstrate thesuperiority and effectiveness of MagicTime for generating high-quality anddynamic metamorphic videos, suggesting time-lapse video generation is apromising path toward building metamorphic simulators of the physical world.Code: https://github.com/PKU-YuanGroup/MagicTime
Further reading
- Access Paper in arXiv.org