Random Access to Grammar-Compressed Strings

Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, Oren Weimann

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    1 Downloads (Pure)


    Let S be a string of length N compressed into a context- free grammar S of size n. We present two representa- tions of S achieving O(logN) random access time, and either O(n-ak(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, k(n) is the inverse of the kth row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and ad- ditional O(m) time. Combining these results with fast algorithms for uncompressed approximate string match- ing leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can nd all approximate occurrences of a pattern P with at most k errors in time O(n(minfjPjk; k4 +jPjg+logN)+occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees. All of the above bounds signicantly improve the cur- rently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data struc- ture, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.
    Original languageEnglish
    Title of host publicationProceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms
    Place of PublicationSan Francisco
    PublisherSociety for Industrial and Applied Mathematics
    Publication date2011
    Publication statusPublished - 2011
    EventAnnual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, California, USA
    Duration: 1 Jan 2011 → …
    Conference number: 22


    ConferenceAnnual ACM-SIAM Symposium on Discrete Algorithms
    CitySan Francisco, California, USA
    Period01/01/2011 → …

    Fingerprint Dive into the research topics of 'Random Access to Grammar-Compressed Strings'. Together they form a unique fingerprint.

    Cite this