Flexible indexing of repetitive collections

Djamal Belazzougui, Fabio Cunial*, Travis Gagie, Nicola Prezza, Mathieu Raffinot

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

228 Downloads (Pure)


Highly repetitive strings are increasingly being amassed by genome sequencing experiments, and by versioned archives of source code and webpages. We describe practical data structures that support counting and locating all the exact occurrences of a pattern in a repetitive text, by combining the run-length encoded Burrows-Wheeler transform (RLBWT) with the boundaries of Lempel-Ziv 77 factors. One such variant uses an amount of space comparable to LZ77 indexes, but it answers count queries between two and four orders of magnitude faster than all LZ77 and hybrid index implementations, at the cost of slower locate queries. Combining the RLBWT with the compact directed acyclic word graph answers locate queries for short patterns between four and ten times faster than a version of the run-length compressed suffix array (RLCSA) that uses comparable memory, and with very short patterns our index achieves speedups even greater than ten with respect to RLCSA.

Original languageEnglish
Title of host publicationUnveiling Dynamics and Complexity - 13th Conference on Computability in Europe, CiE 2017, Proceedings
Number of pages13
Volume10307 LNCS
PublisherSpringer Verlag
Publication date2017
ISBN (Print)9783319587400
Publication statusPublished - 2017
Event13th Conference on Computability in Europe, CiE 2017 - Turku, Finland
Duration: 12 Jun 201716 Jun 2017


Conference13th Conference on Computability in Europe, CiE 2017
SeriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10307 LNCS

Fingerprint Dive into the research topics of 'Flexible indexing of repetitive collections'. Together they form a unique fingerprint.

Cite this