Abstract
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consecutive occurrences problem (Sitcco). Here, a consecutive occurrence is a pair (i, j), i < j, such that P occurs at positions i and j in S and there is no occurrence of P between i and j, and their distance is defined as j − i. Given a pattern P and a parameter k, the goal is to report the top-k consecutive occurrences of P in S of minimal distance. The challenge is to compactly represent S while supporting queries in time close to the length of P and k. We give two time-space trade-offs for the problem. Let n be the length of S, m the length of P, and ε ∈ (0, 1]. Our first result achieves O(n log n) space and optimal query time of O(m + k), and our second result achieves linear space and query time O(m + k1+ε). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.
Original language | English |
---|---|
Title of host publication | Proceedings of 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science |
Editors | Nitin Saxena, Sunil Simon |
Number of pages | 17 |
Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
Publication date | Dec 2020 |
Article number | 14 |
ISBN (Electronic) | 9783959771740 |
DOIs | |
Publication status | Published - Dec 2020 |
Event | 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science - Virtual, Goa, India Duration: 14 Dec 2020 → 18 Dec 2020 Conference number: 40 |
Conference
Conference | 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science |
---|---|
Number | 40 |
Country/Territory | India |
City | Virtual, Goa |
Period | 14/12/2020 → 18/12/2020 |
Series | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|
Volume | 182 |
ISSN | 1868-8969 |
Bibliographical note
Funding Information:Funding Philip Bille: Supported by the Danish Research Council grant DFF-8021-002498. Inge Li Gørtz: Supported by the Danish Research Council grant DFF-8021-002498. Max Rishøj Pedersen: Supported by the Danish Research Council grant DFF-8021-002498. Eva Rotenberg: Supported by the Danish Research Council grant DFF-8021-002498.
Publisher Copyright:
© Philip Bille, Inge Li Gørtz, Max Rishøj Pedersen, Eva Rotenberg, and Teresa Anna Steiner.
Keywords
- Consecutive occurrences
- Pattern matching
- String indexing