Random Access to Grammar-Compressed Strings

Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, Oren Weimann

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    1 Downloads (Pure)

    Abstract

    Let S be a string of length N compressed into a context- free grammar S of size n. We present two representa- tions of S achieving O(logN) random access time, and either O(n-ak(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, k(n) is the inverse of the kth row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and ad- ditional O(m) time. Combining these results with fast algorithms for uncompressed approximate string match- ing leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can nd all approximate occurrences of a pattern P with at most k errors in time O(n(minfjPjk; k4 +jPjg+logN)+occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees. All of the above bounds signicantly improve the cur- rently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data struc- ture, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.
    Original languageEnglish
    Title of host publicationProceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms
    Place of PublicationSan Francisco
    PublisherSociety for Industrial and Applied Mathematics
    Publication date2011
    Pages373-389
    Publication statusPublished - 2011
    EventAnnual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, California, USA
    Duration: 1 Jan 2011 → …
    Conference number: 22

    Conference

    ConferenceAnnual ACM-SIAM Symposium on Discrete Algorithms
    Number22
    CitySan Francisco, California, USA
    Period01/01/2011 → …

    Cite this

    Bille, P., Landau, G. M., Raman, R., Sadakane, K., Satti, S. R., & Weimann, O. (2011). Random Access to Grammar-Compressed Strings. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 373-389). San Francisco: Society for Industrial and Applied Mathematics.
    Bille, Philip ; Landau, Gad M. ; Raman, Rajeev ; Sadakane, Kunihiko ; Satti, Srinivasa Rao ; Weimann, Oren. / Random Access to Grammar-Compressed Strings. Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms. San Francisco : Society for Industrial and Applied Mathematics, 2011. pp. 373-389
    @inproceedings{43a60370c4fb416f8d9f79a894579b52,
    title = "Random Access to Grammar-Compressed Strings",
    abstract = "Let S be a string of length N compressed into a context- free grammar S of size n. We present two representa- tions of S achieving O(logN) random access time, and either O(n-ak(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, k(n) is the inverse of the kth row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and ad- ditional O(m) time. Combining these results with fast algorithms for uncompressed approximate string match- ing leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can nd all approximate occurrences of a pattern P with at most k errors in time O(n(minfjPjk; k4 +jPjg+logN)+occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees. All of the above bounds signicantly improve the cur- rently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data struc- ture, two {"}biased{"} weighted ancestor data structures, and a compact representation of heavy-paths in grammars.",
    author = "Philip Bille and Landau, {Gad M.} and Rajeev Raman and Kunihiko Sadakane and Satti, {Srinivasa Rao} and Oren Weimann",
    year = "2011",
    language = "English",
    pages = "373--389",
    booktitle = "Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms",
    publisher = "Society for Industrial and Applied Mathematics",

    }

    Bille, P, Landau, GM, Raman, R, Sadakane, K, Satti, SR & Weimann, O 2011, Random Access to Grammar-Compressed Strings. in Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, San Francisco, pp. 373-389, Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, USA, 01/01/2011.

    Random Access to Grammar-Compressed Strings. / Bille, Philip; Landau, Gad M.; Raman, Rajeev; Sadakane, Kunihiko; Satti, Srinivasa Rao; Weimann, Oren.

    Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms. San Francisco : Society for Industrial and Applied Mathematics, 2011. p. 373-389.

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    TY - GEN

    T1 - Random Access to Grammar-Compressed Strings

    AU - Bille, Philip

    AU - Landau, Gad M.

    AU - Raman, Rajeev

    AU - Sadakane, Kunihiko

    AU - Satti, Srinivasa Rao

    AU - Weimann, Oren

    PY - 2011

    Y1 - 2011

    N2 - Let S be a string of length N compressed into a context- free grammar S of size n. We present two representa- tions of S achieving O(logN) random access time, and either O(n-ak(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, k(n) is the inverse of the kth row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and ad- ditional O(m) time. Combining these results with fast algorithms for uncompressed approximate string match- ing leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can nd all approximate occurrences of a pattern P with at most k errors in time O(n(minfjPjk; k4 +jPjg+logN)+occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees. All of the above bounds signicantly improve the cur- rently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data struc- ture, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.

    AB - Let S be a string of length N compressed into a context- free grammar S of size n. We present two representa- tions of S achieving O(logN) random access time, and either O(n-ak(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, k(n) is the inverse of the kth row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and ad- ditional O(m) time. Combining these results with fast algorithms for uncompressed approximate string match- ing leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can nd all approximate occurrences of a pattern P with at most k errors in time O(n(minfjPjk; k4 +jPjg+logN)+occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees. All of the above bounds signicantly improve the cur- rently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data struc- ture, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.

    M3 - Article in proceedings

    SP - 373

    EP - 389

    BT - Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms

    PB - Society for Industrial and Applied Mathematics

    CY - San Francisco

    ER -

    Bille P, Landau GM, Raman R, Sadakane K, Satti SR, Weimann O. Random Access to Grammar-Compressed Strings. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms. San Francisco: Society for Industrial and Applied Mathematics. 2011. p. 373-389