### Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. A linear space index with query time O(m+σjloglog n+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846-857, [2007]), which requires query time Θ(jn) in the worst case. An index with query time O(m+j+occ) using space {Mathematical expression}, where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91-100, [2004]). We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

Original language | English |
---|---|

Journal | Theory of Computing Systems |

Volume | 55 |

Issue number | 1 |

Pages (from-to) | 41-60 |

Number of pages | 20 |

ISSN | 1432-4350 |

DOIs | |

Publication status | Published - 2014 |

### Keywords

- LCP data structure
- String indexing
- Suffix tree
- Variable length gap
- Wildcard

## Cite this

Bille, P., Gørtz, I. L., Vildhøj, H. W., & Vind, S. J. (2014). String Indexing for Patterns with Wildcards.

*Theory of Computing Systems*,*55*(1), 41-60. https://doi.org/10.1007/s00224-013-9498-4