Abstract
The Lempel-Ziv (LZ) 77 factorization of a string is a widely used algorithmic tool that plays a central role in compression and indexing. For a length-n string over a linearly-sortable alphabet, e.g.,Σ = {1,...,σ} with σ = nO(1), it can be computed in O(n) time. It is unknown whether this time can be achieved for the rightmost LZ parsing, where each referencing phrase points to its rightmost previous occurrence. The currently best solution takes O(n(1 + log σ/√log n))time (Belazzougui & Puglisi SODA2016). We show that this problem is much easier to solve for the LZ-End factorization (Kreft & NavarroDCC2010), where the rightmost factorization can be obtained in O(n) time for the greedy parsing (with phrases of maximal length), and in O(n + z√log z) time for any LZ-End parsing of z phrases. We also make advances towards a linear time solution for the general case. We show how to solve multiple non-trivial subsets of the phrases of any LZ-like parsing in O(n) time. As a prime example, we can find the rightmost occurrence of all phrases of length Ω(log6.66 n/ log2 σ) in O(n/ logσn) time and space.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th International Symposium on String Processing and Information Retrieval, SPIRE |
Volume | 14240 |
Publisher | Springer |
Publication date | 2023 |
Pages | 188-202 |
ISBN (Print) | 978-3-031-43979-7 |
ISBN (Electronic) | 978-3-031-43980-3 |
DOIs | |
Publication status | Published - 2023 |
Event | 30th International Symposium on String Processing and Information Retrieval - Pisa, Italy Duration: 26 Sept 2023 → 28 Sept 2023 |
Conference
Conference | 30th International Symposium on String Processing and Information Retrieval |
---|---|
Country/Territory | Italy |
City | Pisa |
Period | 26/09/2023 → 28/09/2023 |
Keywords
- Lempel-Ziv
- LZ77
- Rightmost LZ
- LZ-End
- Lossless compression
- String algorithms
- Linear time algorithms
- Word-packing