Skip to main navigation Skip to search Skip to main content

PULLM: A Multimodal Framework for Enhanced 3D Point Cloud Upsampling Using Large Language Models

  • Zhiyong Zhang
  • , Ruyu Liu*
  • , Xiufeng Liu
  • , Yunrui Zhu
  • , Yanyan Yang
  • , Chaochao Wang
  • , Jianhua Zhang
  • *Corresponding author for this work
  • Tianjin University of Technology
  • University of Portsmouth
  • Jiaxing University

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Point cloud upsampling is a critical task in 3D computer vision, aiming to generate dense and uniformly distributed point sets from sparse inputs. While current self-supervised methods show promise, they often struggle with preserving fine-grained geometric details, especially for highly sparse point clouds. To address these limitations, we propose PointUpsampleLLM (PULLM), a novel multimodal framework that leverages the power of large language models (LLMs) to enhance 3D point cloud upsampling. PULLM integrates a pretrained Point Cloud LLM (PointLLM) with visual features extracted from point clouds, learning a unified representation that captures both geometric and semantic information. At the core of our approach is the Feature Aware Translator (FAT) module, which effectively bridges the modality gap between visual and textual features, enhancing the spatial understanding of the LLM. PULLM generates textual descriptions of point clouds on-the-fly, eliminating the need for large paired datasets. Extensive experiments on the PU1K and PUGAN benchmarks demonstrate that PULLM consistently outperforms state-of-the-art methods, achieving significant improvements in Chamfer Distance, Hausdorff Distance, and Point-to-Plane distance metrics. For instance, on the PUGAN dataset with sparse inputs, PULLM achieves a 56.15% improvement in Chamfer Distance over the best baseline. Our qualitative results further illustrate PULLM's superior ability to preserve fine details and generate high-quality upsampled point clouds across various object types and geometries.

Original languageEnglish
Title of host publicationProceedings of the 40th ACM/SIGAPP Symposium on Applied Computing
Publication date2025
Pages1223-1230
DOIs
Publication statusPublished - 2025
Event40th Annual ACM Symposium on Applied Computing, SAC 2025 - Catania, Italy
Duration: 31 Mar 20254 Apr 2025

Conference

Conference40th Annual ACM Symposium on Applied Computing, SAC 2025
Country/TerritoryItaly
CityCatania
Period31/03/202504/04/2025
SponsorACM Special Interest Group on Applied Computing

Keywords

  • 3D computer vision
  • Feature aware translator (FAT)
  • Large language models (LLMs)
  • Multimodal learning
  • Point cloud upsampling

Fingerprint

Dive into the research topics of 'PULLM: A Multimodal Framework for Enhanced 3D Point Cloud Upsampling Using Large Language Models'. Together they form a unique fingerprint.

Cite this