Abstract
We revisit the popular delayed deterministic finite automaton (D2FA) compression algorithm introduced by Kumar et al. [SIGCOMM 2006] for compressing deterministic finite automata (DFAs) used in intrusion detection systems. This compression scheme exploits similarities in the outgoing sets of transitions among states to achieve strong compression while maintaining high throughput for matching.
Unfortunately, the D2FA algorithm and later variants of it require at least quadratic compression time since they compare all pairs of states to compute an optimal compression. This is too slow and, in some cases, even infeasible for collections of regular expression in modern intrusion detection systems that produce DFAs of millions of states.
Our main result is a simple, general framework for constructing D2FA based on locality-sensitive hashing that constructs an approximation of the optimal D2FA in near-linear time. We apply our approach to the original D2FA compression algorithm and two important variants, and we experimentally evaluate our algorithms on DFAs from widely used modern intrusion detection systems. Overall, our new algorithms compress up to an order of magnitude faster than existing solutions with either no or little loss of compression size. Consequently, our algorithms are significantly more scalable and can handle larger collections of regular expressions than previous solutions.
Unfortunately, the D2FA algorithm and later variants of it require at least quadratic compression time since they compare all pairs of states to compute an optimal compression. This is too slow and, in some cases, even infeasible for collections of regular expression in modern intrusion detection systems that produce DFAs of millions of states.
Our main result is a simple, general framework for constructing D2FA based on locality-sensitive hashing that constructs an approximation of the optimal D2FA in near-linear time. We apply our approach to the original D2FA compression algorithm and two important variants, and we experimentally evaluate our algorithms on DFAs from widely used modern intrusion detection systems. Overall, our new algorithms compress up to an order of magnitude faster than existing solutions with either no or little loss of compression size. Consequently, our algorithms are significantly more scalable and can handle larger collections of regular expressions than previous solutions.
Original language | English |
---|---|
Title of host publication | 50th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2025 |
Volume | 15538 |
Publisher | Springer |
Publication date | 2025 |
Pages | 136-150 |
ISBN (Electronic) | 978-3-031-82670-2 |
DOIs | |
Publication status | Published - 2025 |
Event | 50th International Conference on Current Trends in Theory and Practice of Computer Science - Bratislava, Slovakia Duration: 20 Jan 2025 → 23 Jan 2025 |
Conference
Conference | 50th International Conference on Current Trends in Theory and Practice of Computer Science |
---|---|
Country/Territory | Slovakia |
City | Bratislava |
Period | 20/01/2025 → 23/01/2025 |
Series | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
ISSN | 0302-9743 |
Keywords
- Algorithms
- Data Compression
- Finite Automata