EP24: ParserHawk: Hardware-aware parser generator using program synthesis, Dec. 17, 2025

Paper: ParserHawk: Hardware-aware parser generator using program synthesis
Authors: Xiangyu Gao, Jiaqi Gao, Karan Kumar G, Muhammad Haseeb, Ennan Zhai, Bita Dargahi, Joseph Tassarotti, Srinivas Narayana, Anirudh Sivaraman
Presenter: Yao Wang, Xiamen University
Guest of Honor: Xiangyu Gao, University of Washington

Q: Is it possible to use natural language models to conduct research and generate code synthesis that can generate intermediate representation or optimized code based on hardware resource constraints?

A: That’s a very good question. I think LLM-driven code generation is very promising. At least I tried half a year ago, and right now I’m trying it again. I find that the capability of LLM has increased a lot. First of all, the paper was written one year ago, so at that moment, LLM was not that powerful. Second, I think in this compilation process, correctness is very important, plus optimizations. The reason we use a program synthesizer rather than an LLM is that it can guarantee semantic correctness, plus somehow optimize the resource usage. But LLM is still relatively bad at that, meaning it can sometimes generate results close to correctness or close to optimal, but the last-mile gap is still there. What I think can be useful here is: first, we can use LLM to drive the code generation and the program synthesis procedure to guide finding the correct result. Second, if we feed the LLM into the CEGIS loop where we generate something, verify it, and then generate again based on verification results, feeding the LLM as a subroutine could also be useful.

Q: What do you think are the fundamental challenges in this area?

A: I think there are several fundamental things. First, I use P4, which is a very specific language. If you try to regenerate code from mainstream languages like C, Python, or other system languages, those languages have more benchmarks to train on. But for these niche domains, they are still lacking examples for the LLM to get correct results. Second, the hardware architecture is also very new to the language models. What I mean is, you not only need to write semantically or syntactically correct P4 code, but also need to make sure your register style is hardware-aware, meaning it makes the compiler happy. Third, we still rely on the P4 compiler, so we need to know what optimizations they are doing. In order to make it happy, we need to generate something that is compiler-aware as well.

Q: How do you decide the generated code is accurate in terms of the P4 compiler?

A: Basically, what I do is write a simulator. In this case, I encode the generated P4 program and my specific input P4 program, and I simulate for all packets whether there are inconsistencies between the original program’s output and the generated program’s output. If there are inconsistencies, meaning for the same input, they have different outputs, I will just add these examples to the test set to say you should also make this test case work.

Q: Is it possible to use verification frameworks to verify the output of code synthesis?

A: Yes, I already use this. In this case, I use tools based on SMT solvers to check. The reason we guarantee correctness is that we use SMT solvers to check that the generated code is correct. I think they use brute force or other ways to check all the input and output relationships and make sure they are correct.

Q: How did you find this problem and come up with the idea of using program synthesis to tackle it?

A: The definition of program synthesis is that people write high-level language programs, and you generate a semantically equivalent candidate that follows your constraints. First, this domain is niche, meaning the parser only supports extracting some simple behaviors - collecting data from the input packet bitstream and assigning it to variables. This means we don’t have too much flexibility for program synthesis to choose, which simplifies the program synthesis task and makes it faster. Second, this domain has relatively high requirements for optimizations - we really want to reduce resource usage, like the number of TCAM entries and parser stages. Third, most programs in the programmable network domain change less frequently. So even though we spend more time on program synthesis to find a relatively good result, that’s acceptable because we only need to synthesize it once.

Q: What was the most challenging part of this work, and how long did it take?

A: I think the main challenging part is coming up with the optimization problems and optimization strategies. The reason is that the input packet string can be more than hundreds of bits, like TCP header, UDP header, IP header, IPv4, IPv6. It’s not that hard to fit into a program synthesis problem, but the initial time takes much longer. Let me give you an example - if we only have two parser nodes, like we parse TCP first and then IP, without any optimization, that takes more than 10 hours to get the result. That’s why we thought a lot about finding interesting domain-specific information from the parser problems to reduce synthesis time. This was very challenging for me initially, even making me think maybe we shouldn’t do this project because it’s too long. But fortunatel, we found interesting optimizations, and finally it worked.

Q: Did you try ParserHawk on BlueField or Pensando SmartNIC?

A: Not yet, because previously I didn’t have access to BlueField SmartNIC. Plus BlueField has an ARM core there, so I haven’t figured out a very great way to treat the ARM core plus fixed function packet processing. I think BlueField has a little bit different architecture on how to encode the ARM core - I may need to think a bit more about that. Similar to Pensando.

Q: What’s your next step after parser?

A: One thing is to jointly optimize between the parser and the pipeline. And right now I’m also doing some eBPF code generation at this moment, so hopefully program synthesis can play a role in that part.

Q: What’s your advice for students who want to get started in this area?

A: I think if I were a fresh student right now, I would try to chat with more senior folks, either from academia or industry, to see what the real painful problems are, and then try to solve them. There are tons of problems in the world, but some are not interesting, and some are not urgent. For system research, I think we really need to see whether the problems we solve are what people need. When you collect maybe 5 or 10 problems, you can combine them with your interests and techniques to see how to solve them. I think finding the problem is the most important thing, because the community has some consensus about what we should solve. This is very important for fresh students - later you can find your own problems, but in the beginning it’s good to see what people are interested in and solve it from your perspective.