EP2: Expresso: Comprehensively Reasoning About External Routes Using Symbolic Simulation (SIGCOMM’24), Nov. 6, 2024

Xing · November 6, 2024, 5:06pm

Paper: Expresso: Comprehensively Reasoning About External Routes Using Symbolic Simulation (Honarable mention Award, SIGCOMM’24)
Authors: Dan Wang, Peng Zhang, Aaron Gember-Jacobson

Presenter: Xing Fang, Xiamen University, China
Guest of Honor: Peng Zhang

RulanYang · November 7, 2024, 3:31pm

This discussion starts with Matt, Peng, and Qiao’s opinions about using symbolic execution for network verification and diagnosis:

Matt: What excites me about this paper is its fresh approach to building networks. Today networks are configured and tested manually by operators, and human error leads to missed vulnerabilities. But in other engineering fields—like nuclear power or aviation systems are built using formal methods, mathematical rigor, and proof-based design. I see this kind of work as bringing that rigor to networking could lead to stronger, provably correct networks and to build reliable systems.

Another interesting aspect is the focus on network verification, which started with verifying the data plane, specifically the forwarding tables. This is useful but limited, as it doesn’t account for router software or configurations. This paper goes further by verifying network configurations, which is powerful because it considers not only the network’s current behavior but also its potential responses to external routes. This provides much stronger guarantees.

He thinks there’s an interesting choice about what to make symbolic. Early work treated the data plane as fixed while making incoming packets symbolic. This paper treats configurations as fixed, while routes and traffic are symbolic. So he brought up a question: Could we make parts of the configuration symbolic too? For instance, could we ensure the router behaves correctly no matter what configuration an operator inputs?

Peng: Two years ago, we developed SRE, which treats link states as symbolic—links can be up or down without knowing the state in advance. This idea evolved to apply symbolic analysis to external routes. While making the entire configuration symbolic is challenging due to complexity, we’re exploring a partial approach. For example, by making certain configuration parameters like local preferences or community attributes symbolic, we might synthesize correct configurations and repair issues. This is complex, but as a community, we can explore ways to make it feasible.

Qiao: I’ll add to Peng’s points. Making configurations symbolic allows for localization and repair. If we focus on key variables based on common operator habits or configuration templates. That is leveraging templates to limit the complexity, we can restrict what parts of the configuration are symbolic. This allows for effective simulation and can integrate diagnostics and repairs into the emulation process. Of course, there may be other productive ways to approach this as well.

Matt: That’s a great answer. I was thinking along similar lines—maybe the next step is synthesis. Ideally, network operators shouldn’t have to write configurations from scratch, they could be synthesized instead. I wonder if this approach could generalize to configuration synthesis—perhaps this work could extend in that direction.

Then Qiao shares his thoughts about the relationship between Expresso and the paper shared last meeting:

Qiao: Since we have such a large amount of data, there’s something we can learn from it. This is one thing I’ve been thinking about—imagine if I had access to a huge amount of BGP route announcements. I could even reverse-engineer the configurations of a specific AS. If I had an infinite amount of route data, it would essentially allow us to conduct external analysis of neighboring AS configurations. This could offer a way to address the external route verification problem. I’m not saying external routes aren’t important, but reverse-engineering neighboring ISPs’ configurations might provide a different approach to tackling external route reasoning. Just a thought—though it might not necessarily make sense.

Q & A:

Q1 (Adrian Zapletal): What’s the difference between Expresso and Expresso-?

A1 (Xing Fang): Expresso models all attributes of routes symbolic while Expresso- uses concrete, instead of symbolic, AS paths for symbolic routes.

Q2 (Xing Fang): I think it’s a great paper, especially because it addresses a unique problem that other verification methods haven’t been able to handle. It focuses on scenarios involving external routes that existing network verification techniques tend to overlook. However, why previous methods didn’t target these situations or attempt to reason about external routes? It made me wonder if it’s something that real networks don’t prioritize. For instance, internal networks might control external routing prefixes through whitelisting, so maybe it’s not seen as essential.

A2 (Dan Wang, First author of Expresso): We discussed a simple workaround—checking filtering and import policies on edge routers to ensure they’re applied correctly. Newer tools, like the latest version of Batfish, now offer capabilities to verify these policies. However, we believe it’s necessary to look at the network as a whole. In BGP, for instance, it’s not just import and export policies that affect routing; session properties, like adding community tags, also play an important role in how routes are announced and exchanged.

Q3 (Xuan Chen): This article primarily discussed lPv4.Could you tell us, in the case of lPv6, aside from needing more variables, what other changes there might be?

A3 (Peng): That’s a great question. We’ve seen both IPv4 and IPv6 prefixes in configurations, but we haven’t fully tackled IPv6 yet. We expect that the space won’t be much larger than the current one, even though the address space of IPv6 is vast. The larger space in routing tends to come from communities and AS paths rather than the address space itself. So, we don’t anticipate significant challenges with IPv6, but we haven’t yet researched it thoroughly, both for configuration and data plane verification.
This is an area worth exploring further, and I think it could make for valuable future work. As far as I know, there hasn’t been much research on IPv6 verification, either in terms of configurations or databases. If any students are interested, we have a range of open-source verification tools available. It could be useful to experiment with them and see if there are any efficiency or correctness challenges—if so, that could lead to some very interesting papers in the future.

Q4 (Qiao): Do you have any interesting behind-the-scenes stories you’d like to share about working on this paper?

A4 (Peng): We were fortunate that our SRE worked well, and we were able to extend it to handle x-node routing, which worked out very smoothly. But something I’d like to mention is the importance of verification. While we’re verifying our network, we also have to consider the possibility that other networks might be misconfigured. If they advertise invalid routes, it could bring down your network, even if yours is functioning perfectly. Many of the outages on the internet are caused by BGP misconfigurations. Our goal is to avoid paying for other people’s mistakes. By improving the robustness of our configurations, we can ensure that even if a neighbor advertises an incorrect prefix, it won’t impact our network.

This brings up a broader challenge: how to model the external environment in verification. It’s a tough problem because you can’t control the entire network, only your configurations. But there’s a lot of potential for improvement in this area.

That’s a really interesting point. I’m just speculating, but there may be an impossibility theorem to be explored here. For example, if you can only modify your configuration, and you aim to make it robust, you might lose some flexibility in routing policies, similar to how the CAP theorem works in databases. So, in network configurations, achieving robustness might come at the cost of flexibility, efficiency, or other factors. This could be an interesting area for future research.