Abstract
We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. A linear space index with query time O(m+σjloglog n+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846-857, [2007]), which requires query time Θ(jn) in the worst case. An index with query time O(m+j+occ) using space {Mathematical expression}, where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91-100, [2004]). We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.
Original language | English |
---|---|
Journal | Theory of Computing Systems |
Volume | 55 |
Issue number | 1 |
Pages (from-to) | 41-60 |
Number of pages | 20 |
ISSN | 1432-4350 |
DOIs | |
Publication status | Published - 2014 |
Keywords
- LCP data structure
- String indexing
- Suffix tree
- Variable length gap
- Wildcard