StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

15 Downloads (Pure)

Abstract

The demand for stereo images increases as manufacturers launch more extended reality (XR) devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is training-free and straightforward to use with seamless integration into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight method for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layer Modifications to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
PublisherIEEE
Publication date2024
Pages7416-7425
ISBN (Print)979-8-3503-6548-1
ISBN (Electronic)979-8-3503-6547-4
DOIs
Publication statusPublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops - Seattle, United States
Duration: 17 Jun 202418 Jun 2024

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Country/TerritoryUnited States
CitySeattle
Period17/06/202418/06/2024

Fingerprint

Dive into the research topics of 'StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models'. Together they form a unique fingerprint.

Cite this