Abstract
Glycans are the most abundant biomolecules on Earth, and participate in key processes in all living organisms. The chemical variability and topological complexity of their natural branched structures has been a challenge in computational glycobiology. As a tool for improving predictive models associated with glycobiology, we propose SweetBERT, a BERT-based language model for encoding glycan sequences which includes explicit information about the branching structure of the sequence. This is achieved by including a pseudo-graph representation in the input embeddings. Performance on downstream tasks by our model underscore promising results of Transformer architectures in addressing the complexities of glycan representation.
| Original language | English |
|---|---|
| Publication date | 2025 |
| Number of pages | 12 |
| Publication status | Published - 2025 |
| Event | 2025 International Conference on Learning Representations - , Singapore Duration: 24 Apr 2025 → 28 Apr 2025 |
Conference
| Conference | 2025 International Conference on Learning Representations |
|---|---|
| Country/Territory | Singapore |
| Period | 24/04/2025 → 28/04/2025 |
Fingerprint
Dive into the research topics of 'SweetBERT: exploring BERT-based models for IUPAC glycan nomenclature modeling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver