Skip to main navigation Skip to search Skip to main content

Abstract

Glycans are the most abundant biomolecules on Earth, and participate in key processes in all living organisms. The chemical variability and topological complexity of their natural branched structures has been a challenge in computational glycobiology. As a tool for improving predictive models associated with glycobiology, we propose SweetBERT, a BERT-based language model for encoding glycan sequences which includes explicit information about the branching structure of the sequence. This is achieved by including a pseudo-graph representation in the input embeddings. Performance on downstream tasks by our model underscore promising results of Transformer architectures in addressing the complexities of glycan representation.
Original languageEnglish
Publication date2025
Number of pages12
Publication statusPublished - 2025
Event2025 International Conference on Learning Representations - , Singapore
Duration: 24 Apr 202528 Apr 2025

Conference

Conference2025 International Conference on Learning Representations
Country/TerritorySingapore
Period24/04/202528/04/2025

Fingerprint

Dive into the research topics of 'SweetBERT: exploring BERT-based models for IUPAC glycan nomenclature modeling'. Together they form a unique fingerprint.

Cite this