String indexing with compressed patterns

Philip Bille, Inge Li Gørtz, Teresa Anna Steiner

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

162 Downloads (Orbit)

Abstract

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.

Original languageEnglish
Title of host publicationProceedings of 37th International Symposium on Theoretical Aspects of Computer Science
EditorsChristophe Paul, Markus Blaser
Number of pages13
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Publication dateMar 2020
Article numberLIPIcs-STACS-2020-10
ISBN (Electronic)9783959771405
DOIs
Publication statusPublished - Mar 2020
Event37th International Symposium on Theoretical Aspects of Computer Science - Montpellier, France
Duration: 10 Mar 202013 Mar 2020

Conference

Conference37th International Symposium on Theoretical Aspects of Computer Science
Country/TerritoryFrance
CityMontpellier
Period10/03/202013/03/2020
SponsorMontpellier University of Excellence I-site project, CNRS, LabEx NUMEV, Université de Montpellier, Occitanie Region District
SeriesLeibniz International Proceedings in Informatics, LIPIcs
Volume154
ISSN1868-8969

Keywords

  • Compression
  • Pattern matching
  • String indexing

Fingerprint

Dive into the research topics of 'String indexing with compressed patterns'. Together they form a unique fingerprint.

Cite this