Abstract
Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let m <= n be the lengths P and Q, respectively, and let sigma denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time
O(n/log(sigma) n + m + occ)
Here occ is the number of occurrences of P in Q. For m = o(n) this improves the O(n) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m = O(n/log(sigma) n) our algorithm is optimal since any algorithm must spend at least Omega((n+m) log sigma/log n + occ) = Omega(n/log sigma(n)/ + occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.
O(n/log(sigma) n + m + occ)
Here occ is the number of occurrences of P in Q. For m = o(n) this improves the O(n) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m = O(n/log(sigma) n) our algorithm is optimal since any algorithm must spend at least Omega((n+m) log sigma/log n + occ) = Omega(n/log sigma(n)/ + occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.
Original language | English |
---|---|
Title of host publication | Combinatorial Pattern Matching |
Publication date | 2009 |
Pages | 116-126 |
ISBN (Print) | 978-3-642-02440-5 |
DOIs | |
Publication status | Published - 2009 |
Event | 20th Annual Symposium on Combinatorial Pattern Matching - Lille, France Duration: 22 Jun 2009 → 24 Jun 2009 Conference number: 20 |
Conference
Conference | 20th Annual Symposium on Combinatorial Pattern Matching |
---|---|
Number | 20 |
Country/Territory | France |
City | Lille |
Period | 22/06/2009 → 24/06/2009 |
Series | Lecture Notes in Computer Science |
---|---|
Volume | 5577 |
ISSN | 0302-9743 |