DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

This paper presents a novel method for exerting fine-grained lighting controlduring text-driven diffusion-based image generation. While existing diffusionmodels already have the ability to generate images under any lightingcondition, without additional guidance these models tend to correlate imagecontent and lighting. Moreover, text prompts lack the necessary expressionalpower to describe detailed lighting setups. To provide the content creator withfine-grained control over the lighting during image generation, we augment thetext-prompt with detailed lighting information in the form of radiance hints,i.e., visualizations of the scene geometry with a homogeneous canonicalmaterial under the target lighting. However, the scene geometry needed toproduce the radiance hints is unknown. Our key observation is that we only needto guide the diffusion process, hence exact radiance hints are not necessary;we only need to point the diffusion model in the right direction. Based on thisobservation, we introduce a three stage method for controlling the lightingduring image generation. In the first stage, we leverage a standard pretraineddiffusion model to generate a provisional image under uncontrolled lighting.Next, in the second stage, we resynthesize and refine the foreground object inthe generated image by passing the target lighting to a refined diffusionmodel, named DiLightNet, using radiance hints computed on a coarse shape of theforeground object inferred from the provisional image. To retain the texturedetails, we multiply the radiance hints with a neural encoding of theprovisional synthesized image before passing it to DiLightNet. Finally, in thethird stage, we resynthesize the background to be consistent with the lightingon the foreground object. We demonstrate and validate our lighting controlleddiffusion model on a variety of text prompts and lighting conditions.

Further reading