Abstract
The demand for stereo images increases as manufacturers launch more extended reality (XR) devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is training-free and straightforward to use with seamless integration into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight method for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layer Modifications to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |
Publisher | IEEE |
Publication date | 2024 |
Pages | 7416-7425 |
ISBN (Print) | 979-8-3503-6548-1 |
ISBN (Electronic) | 979-8-3503-6547-4 |
DOIs | |
Publication status | Published - 2024 |
Event | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops - Seattle, United States Duration: 17 Jun 2024 → 18 Jun 2024 |
Conference
Conference | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 17/06/2024 → 18/06/2024 |