Med-Art: Diffusion Transformer for 2D Medical Text-to-Image Generation

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Text-to-image generative models have achieved remarkable breakthroughs in recent years. However, their application in medical image generation still faces significant challenges, including small dataset sizes, and scarcity of medical textual data. To address these challenges, we propose Med-Art, a framework specifically designed for medical image generation with limited data. Med-Art leverages vision-language models to generate visual descriptions of medical images which overcomes the scarcity of applicable medical textual data. Med-Art adapts a large-scale pre-trained text-to-image model, PixArt- α , based on the Diffusion Transformer (DiT), achieving high performance under limited data. Furthermore, we propose an innovative Hybrid-Level Diffusion Fine-tuning (HLDF) method, which enables pixel-level losses, effectively addressing issues such as overly saturated colors. We achieve state-of-the-art performance on two medical image datasets, measured by FID, KID, and downstream classification performance. The project is available at https://medart-ai.github.io .
Original languageEnglish
Title of host publicationProceedings of the 5th MICCAI Workshop on Deep Generative Models
PublisherSpringer
Publication date2026
Pages57-66
ISBN (Print)978-3-032-05471-5
ISBN (Electronic)978-3-032-05472-2
DOIs
Publication statusPublished - 2026
EventThe 5th MICCAI Workshop on Deep Generative Models - Daejeon, Korea, Republic of
Duration: 23 Sept 202523 Sept 2025

Workshop

WorkshopThe 5th MICCAI Workshop on Deep Generative Models
Country/TerritoryKorea, Republic of
CityDaejeon
Period23/09/202523/09/2025

Keywords

  • Text-to-image
  • Generative models
  • Medical image generation

Fingerprint

Dive into the research topics of 'Med-Art: Diffusion Transformer for 2D Medical Text-to-Image Generation'. Together they form a unique fingerprint.

Cite this