Rank and Select on Degenerate Strings

Philip Bille, Inge Li Gørtz, Tord Stordalen

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


A degenerate string is a sequence of subsets of some alphabet; it represents any string obtainable by selecting one character from each set from left to right. Recently, Alanko et al. generalized the rank-select problem to degenerate strings, where given a character c and position i the goal is to find either the ith set containing c or the number of occurrences of c in the first i sets [SEA 2023]. The problem has applications to pangenomics; in another work by Alanko et al. they use it as the basis for a compact representation of de Bruijn Graphs that supports fast membership queries.In this paper we revisit the rank-select problem on degenerate strings, introducing a new, natural parameter and reanalyzing existing reductions to rank-select on regular strings. Plugging in standard data structures, the time bounds for queries are improved exponentially while essentially matching, or improving, the space bounds. Furthermore, we provide a lower bound on space that shows that the reductions lead to succinct data structures in a wide range of cases. Finally, we provide implementations; our most compact structure matches the space of the most compact structure of Alanko et al. while answering queries twice as fast. We also provide an implementation using modern vector processing features; it uses less than one percent more space than the most compact structure of Alanko et al. while supporting queries four to seven times faster, and has competitive query time with all the remaining structures.
Original languageEnglish
Title of host publicationProceedings of the 2024 Data Compression Conference (DCC)
Publication date2024
ISBN (Print)979-8-3503-8588-5
ISBN (Electronic)979-8-3503-8587-8
Publication statusPublished - 2024
Event2024 Data Compression Conference - Snowbird, United States
Duration: 19 Mar 202422 Mar 2024


Conference2024 Data Compression Conference
Country/TerritoryUnited States


Dive into the research topics of 'Rank and Select on Degenerate Strings'. Together they form a unique fingerprint.

Cite this