Abstract
The compressed indexing problem is to preprocess a string S of length n into a compressed representation that supports pattern matching queries. That is, given a string P of length m report all occurrences of P in S.We present a data structure that supports pattern matching queries in O(m + occ(lg lg n + lg(epsilon) z)) time using O(z lg(n/z)) space where z is the size of the LZ77 parse of S and epsilon > 0 is an arbitrarily small constant, when the alphabet is small or z = O(n(1-delta)) for any constant delta > 0. We also present two data structures for the general case; one where the space is increased by O(z lg lg z), and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if P occurs in S in O(m) time using O(z lg(n/z)) space. Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting.
Original language | English |
---|---|
Title of host publication | LATIN 2018: Theoretical Informatics |
Editors | Michael A. Bender , Martin Farach-Colton , Miguel A. Mosteiro |
Number of pages | 15 |
Volume | 10807 |
Publisher | Springer |
Publication date | 2018 |
Pages | 331-345 |
DOIs | |
Publication status | Published - 2018 |
Event | 13th Latin American Theoretical INformatics Symposium - Cultural Center Borges, Buenos Aires, Argentina Duration: 16 Apr 2018 → 19 Apr 2018 Conference number: 13 https://latin2018.dc.uba.ar/# |
Conference
Conference | 13th Latin American Theoretical INformatics Symposium |
---|---|
Number | 13 |
Location | Cultural Center Borges |
Country/Territory | Argentina |
City | Buenos Aires |
Period | 16/04/2018 → 19/04/2018 |
Internet address |
Series | Lecture Notes in Computer Science |
---|---|
ISSN | 0302-9743 |