TY - JOUR
T1 - Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation
AU - Bille, Philip
AU - Christiansen, Anders Roy
AU - Cording, Patrick Hagge
AU - Gørtz, Inge Li
AU - Skjoldjensen, Frederik Rye
AU - Vildhøj, Hjalte Wedel
AU - Vind, Søren
PY - 2018
Y1 - 2018
N2 - Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.
AB - Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.
U2 - 10.1007/s00453-017-0380-7
DO - 10.1007/s00453-017-0380-7
M3 - Journal article
AN - SCOPUS:85031894220
SN - 0178-4617
VL - 80
SP - 3207
EP - 3224
JO - Algorithmica
JF - Algorithmica
IS - 11
ER -