### Abstract

Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is b, the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing b is NP-complete, a popular gold standard is z, the number of phrases in the Lempel-Ziv parse of the text, where phrases can be copied only from the left. While z can be computed in linear time, almost nothing has been known for decades about its approximation ratio with respect to b. In this paper we prove that $$z=O(b\log (n/b))$$ z = O ( b log ( n / b ) ) , where n is the text length. We also show that the bound is tight as a function of n, by exhibiting a string family where $$z = \varOmega (b\log n)$$ z = Ω ( b log n ) . Our upper bound is obtained by building a run-length context-free grammar based on a locally consistent parsing of the text. Our lower bound is obtained by relating b with r, the number of equal-letter runs in the Burrows-Wheeler transform of the text. On our way, we prove other relevant bounds between compressibility measures.

Original language | English |
---|---|

Title of host publication | Latin 2018: Theoretical Informatics |

Volume | 10807 |

Publisher | Springer |

Publication date | 2018 |

Pages | 490-503 |

ISBN (Print) | 9783319774039 |

DOIs | |

Publication status | Published - 2018 |

Event | 13th Latin American Theoretical INformatics Symposium - Cultural Center Borges, Buenos Aires, Argentina Duration: 16 Apr 2018 → 19 Apr 2018 Conference number: 13 https://latin2018.dc.uba.ar/# |

### Conference

Conference | 13th Latin American Theoretical INformatics Symposium |
---|---|

Number | 13 |

Location | Cultural Center Borges |

Country | Argentina |

City | Buenos Aires |

Period | 16/04/2018 → 19/04/2018 |

Internet address |

Series | Lecture Notes in Computer Science |
---|---|

Volume | 10807 |

ISSN | 0302-9743 |

### Keywords

- Computer Science
- Algorithm Analysis and Problem Complexity
- Computer Systems Organization and Communication Networks
- Mathematics of Computing
- Data Structures
- Computer Graphics
- Artificial Intelligence (incl. Robotics)

## Cite this

Gagie, T., Navarro, G., & Prezza, N. (2018). On the Approximation Ratio of Lempel-Ziv Parsing. In

*Latin 2018: Theoretical Informatics*(Vol. 10807, pp. 490-503). Springer. Lecture Notes in Computer Science, Vol.. 10807 https://doi.org/10.1007/978-3-319-77404-6_36