Title: ParserHawk: Hardware-aware parser generator using program synthesis
Authors: Xiangyu Gao (University of Washington); Jiaqi Gao (Alibaba Cloud); Karan Kumar G, Muhammad Haseeb (New York University); Ennan Zhai (Alibaba Cloud); Bili Dong (Google); Joseph Tassarotti (New York University); Srinivas Narayana (Rutgers University); Anirudh Sivaraman (New York University)
Introduction
The paper addresses the problem of generating line-rate, hardware-efficient parser implementations from high-level parser specifications (e.g., P4). Existing compiler often produce resource-inefficient implementations and sometimes outright reject parsers that are semantically implementable on the target device. ParserHawk reframes parser code generation as a semantics-driven, constrained combinatorial search problem and applies program synthesis to find implementations that are both correct and hardware-aware. The system also implements a suite of domain-specific optimizations to overcome the slowdowns of naive synthesis, producing up to 309x geometric-mean speedup.
Key idea and contribution:
The core idea is to extract a semantic specification (Spec) from the high-level parser and to express a parameterized, symbolic implementation (Impl′) as an FSM encoded with symbolic TCAM rows, extract/lookup bit selections, and transition assignments. ParserHawk encodes both the generic FSM execution semantics and a device-specific hardware profile into SMT formulas and uses a CEGIS loop with Z3 to synthesize concrete mask/value/next assignments that make Impl′ behaviorally equivalent to Spec. Because hardware constraints are encoded as parameters, the same synthesis framework is retargetable across devices (e.g., Tofino single-table vs. pipelined IPU) with only small profile changes.
A second key contribution is the set of domain-specific scalability optimizations that make synthesis practical. ParserHawk applies several targeted reductions: (Opt1) restrict candidate key bits to those used in the spec; (Opt2) temporarily collapse irrelevant field widths to 1 bit; (Opt3) preallocate field-state extraction where hardware symmetry permits; (Opt4) restrict and decompose the constant search space and parallelize mask synthesis; (Opt5) group per-field bits as indivisible units for transition key allocation; (Opt6) treat varbit fields as fixed during synthesis; and (Opt7) explore subproblems in parallel. These optimizations reduce solver work from hours or days to minutes for most benchmarks.
Evaluation
ParserHawk is evaluated on 58 diverse benchmarks. The paper measure (1) correctness — every synthesized implementation is semantically equivalent to the Spec; (2) resource use — outputs uses less than or the same hardware resources compared to baselines; (3) retargetability — the same framework targets both Tofino and IPU profiles with only small hardware constraints changes; (4) scalability — with the paper’s Opt1–Opt7 optimizations >90% of benchmarks finish within 5 minutes and 44/58 within 1 minute, and the optimizations yield a 309× geometric-mean speedup over a naive synthesis baseline.
Q&A
Q1: It seems that your solution is quite good, but it did experience a bit more compile time, right? Whether this slow performance will affect practical adoption of your solution?
A1: Usually, network functions are not evolving that fast. So this can compile once and then use it for a relatively long time. And we think it’s okay to spend a little bit more time to find a better solution. But if you are in the development process, where you want to get your feedback quickly, this might be a little bit bad. We will continue improving the compilation time.
Q2: Does the optimized parser code, , on those specific hardware architectures, improve latency or packet rate, or does it just improve the capacity of how many positive rules you can download into the hardware?
A2: Exactly. Basically our synthesis is to fully utilize the hardware.
Personal thoughts
The paper’s contribution is significant because it demonstrates that a semantics-driven, solver-based approach can be made practical and often produces more hardware-efficient parser implementations where heuristic compilers fail.
While the paper reports large aggregate speedups and highlights Opt4/Opt5 with ablation numbers, it does not systematically isolate the effect of each optimization across the full benchmark suite. A more thorough ablation study would strengthen confidence in the generality of the approach.
