## Abstract

Differential privacy is the de facto privacy standard in data analysis and widely researched in various application areas. On the other hand, analyzing sequences, or

Specifically, in this paper, we consider the

-an ϵ-differentially private algorithm with an additive error of

-an ϵ-differentially private algorithm with an additive error

-an ϵ-differentially private algorithm with an additive error of

The error bounds hold with high probability. All of these algorithms return a witness, that is, if there exists a substring of S with distance at most

*strings*, is essential to many modern data analysis tasks, and those data often include highly sensitive personal data. While the problem of sanitizing sequential data to protect privacy has received growing attention, there is a surprising lack of theoretical studies of algorithms analyzing sequential data that preserve differential privacy while giving*provable guarantees on the accuracy*of such an algorithm. The goal of this paper is to initiate such a study.Specifically, in this paper, we consider the

*k*-approximate pattern matching problem under differential privacy, where the goal is to report or count all substrings of a given string*S*which have a Hamming distance at most k to a pattern P , or decide whether such a substring exists. In our definition of privacy,*individual positions of the string S*are protected. To be able to answer queries under differential privacy, we allow some slack on*k*, i.e. we allow reporting or counting substrings of*S*with a distance at most (1 + γ)*k*+ α to*P*, for a multiplicative error γ and an additive error α. We analyze which values of α and γ are necessary or sufficient to solve the*k*-approximate pattern matching problem while satisfying ϵ-differential privacy. Let n denote the length of*S*. We give-an ϵ-differentially private algorithm with an additive error of

*O*(ϵ^{−1}log*n*) and no multiplicative error for the existence variant;-an ϵ-differentially private algorithm with an additive error

*O*(ϵ^{−1}max (*k*, log*n*) · log n) for the counting variant;-an ϵ-differentially private algorithm with an additive error of

*O*(ϵ^{−1}log*n*) and multiplicative error*O*(1) for the reporting variant for a special class of patterns.The error bounds hold with high probability. All of these algorithms return a witness, that is, if there exists a substring of S with distance at most

*k*to*P*, then the algorithm returns a substring of*S*with distance at most (1 + γ*)k*+ α to*P*. Further, we complement these results by a lower bound, showing that any algorithm for the existence variant which also returns a witness must have an additive error of Ω(ϵ−1 log*n*) with constant probability.Original language | English |
---|---|

Title of host publication | Proceedings of the 15th Innovations in Theoretical Computer Science Conference (ITCS 2024) |

Volume | 287 |

Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |

Publication date | 2024 |

Pages | 18 |

Article number | 94 |

DOIs | |

Publication status | Published - 2024 |

Event | 15^{th} Innovations in Theoretical Computer Science Conference - Berkeley, United StatesDuration: 30 Jan 2024 → 2 Feb 2024 |

### Conference

Conference | 15^{th} Innovations in Theoretical Computer Science Conference |
---|---|

Country/Territory | United States |

City | Berkeley |

Period | 30/01/2024 → 02/02/2024 |

Series | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|

ISSN | 1868-8969 |

## Keywords

- Differential privacy
- Pattern matching