String Indexing for Patterns with Wildcards

Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, Søren Juhl Vind

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. A linear space index with query time O(m+σjloglog n+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846-857, [2007]), which requires query time Θ(jn) in the worst case. An index with query time O(m+j+occ) using space {Mathematical expression}, where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91-100, [2004]). We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.
Original languageEnglish
JournalTheory of Computing Systems
Volume55
Issue number1
Pages (from-to)41-60
Number of pages20
ISSN1432-4350
DOIs
Publication statusPublished - 2014

Keywords

  • LCP data structure
  • String indexing
  • Suffix tree
  • Variable length gap
  • Wildcard

Cite this