Compressed Indexing with Signature Grammars

Anders Roy Christiansen, Mikko Berggren Ettienne

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

154 Downloads (Pure)

Abstract

The compressed indexing problem is to preprocess a string S of length n into a compressed representation that supports pattern matching queries. That is, given a string P of length m report all occurrences of P in S.We present a data structure that supports pattern matching queries in O(m + occ(lg lg n + lg(epsilon) z)) time using O(z lg(n/z)) space where z is the size of the LZ77 parse of S and epsilon > 0 is an arbitrarily small constant, when the alphabet is small or z = O(n(1-delta)) for any constant delta > 0. We also present two data structures for the general case; one where the space is increased by O(z lg lg z), and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if P occurs in S in O(m) time using O(z lg(n/z)) space. Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting.
Original languageEnglish
Title of host publicationLATIN 2018: Theoretical Informatics
EditorsMichael A. Bender , Martin Farach-Colton , Miguel A. Mosteiro
Number of pages15
Volume10807
PublisherSpringer
Publication date2018
Pages331-345
DOIs
Publication statusPublished - 2018
Event13th Latin American Theoretical INformatics Symposium - Cultural Center Borges, Buenos Aires, Argentina
Duration: 16 Apr 201819 Apr 2018
Conference number: 13
https://latin2018.dc.uba.ar/#

Conference

Conference13th Latin American Theoretical INformatics Symposium
Number13
LocationCultural Center Borges
Country/TerritoryArgentina
CityBuenos Aires
Period16/04/201819/04/2018
Internet address
SeriesLecture Notes in Computer Science
ISSN0302-9743

Fingerprint

Dive into the research topics of 'Compressed Indexing with Signature Grammars'. Together they form a unique fingerprint.

Cite this