On the Approximation Ratio of Lempel-Ziv Parsing

Travis Gagie, Gonzalo Navarro, Nicola Prezza

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

70 Downloads (Pure)


Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is b, the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing b is NP-complete, a popular gold standard is z, the number of phrases in the Lempel-Ziv parse of the text, where phrases can be copied only from the left. While z can be computed in linear time, almost nothing has been known for decades about its approximation ratio with respect to b. In this paper we prove that $$z=O(b\log (n/b))$$ z = O ( b log ( n / b ) ) , where n is the text length. We also show that the bound is tight as a function of n, by exhibiting a string family where $$z = \varOmega (b\log n)$$ z = Ω ( b log n ) . Our upper bound is obtained by building a run-length context-free grammar based on a locally consistent parsing of the text. Our lower bound is obtained by relating b with r, the number of equal-letter runs in the Burrows-Wheeler transform of the text. On our way, we prove other relevant bounds between compressibility measures.
Original languageEnglish
Title of host publicationLatin 2018: Theoretical Informatics
Publication date2018
ISBN (Print)9783319774039
Publication statusPublished - 2018
Event13th Latin American Theoretical INformatics Symposium - Cultural Center Borges, Buenos Aires, Argentina
Duration: 16 Apr 201819 Apr 2018
Conference number: 13


Conference13th Latin American Theoretical INformatics Symposium
LocationCultural Center Borges
CityBuenos Aires
Internet address
SeriesLecture Notes in Computer Science


  • Computer Science
  • Algorithm Analysis and Problem Complexity
  • Computer Systems Organization and Communication Networks
  • Mathematics of Computing
  • Data Structures
  • Computer Graphics
  • Artificial Intelligence (incl. Robotics)

Fingerprint Dive into the research topics of 'On the Approximation Ratio of Lempel-Ziv Parsing'. Together they form a unique fingerprint.

Cite this