Expresso: Comprehensively Reasoning About External Routes Using Symbolic Simulation

Hanyang_Shao · July 30, 2024, 11:37am

Title: Expresso: Comprehensively Reasoning About External Routes Using Symbolic Simulation

Authors : Dan Wang (Xi’an Jiaotong University); Peng Zhang (Xi’an Jiaotong University); Aaron Gember-Jacobson (Colgate University)

Scribe: Hanyang Shao

Introduction
The purpose of network verification tools is to ensure the accuracy and reliability of network configurations. These tools detect potential bugs induced by failures by simulating various failure environments or scenarios to check whether routing/forwarding properties remain valid under a certain number of node or link failures. However, bugs caused by external routes have not been well studied. They are usually related to the routing policy changes of neighboring networks, which can be unpredictable and lead to changes in routing paths and potential traffic hijacking. These issues may not be immediately apparent, but when they occur, they can often lead to severe service disruptions and security vulnerabilities. Existing network verification tools have limitations when dealing with external route environments. Some tools like Hoyan, Tiramisu and Batfish require a concrete set of route advertisements for verification, and enumerating all possible external route advertisements is an infeasible task due to the vast and ever-changing route space. Other tools, e.g., NV, Minesweeper, while allowing external neighbors to symbolically advertise arbitrary sets of routes, are limited by their underlying solvers, which cannot efficiently handle the large variable space required. Expresso addresses this issue by using symbolic simulation to identify equivalences in the space of external routes, effectively scaling the analysis of routing/forwarding properties to all external route environments. The design philosophy of Expresso is to abstract and simplify the complexity of external routes, enabling network verification tools to comprehensively assess the robustness of network configurations against arbitrary changes in external routes within a manageable computational scope.

Key idea and contribution
Symbolic Simulation: Expresso employs symbolic simulation techniques to explore the equivalences in the space of external routes by symbolically representing the potential routes advertised by external neighbors.
Route Equivalence Classes (RECs): Expresso discovers route equivalence classes by symbolically executing the network control plane. This means that prefixes with the same routing policies across the network are treated as equivalent, reducing the complexity of the routing space.
Packet Equivalence Classes (PECs): Expresso further symbolically executes the network data plane to generate packet equivalence classes, which have the same forwarding path in the network based on specific external route environments.
Symbolic Execution of Control and Data Planes:

Control Plane: Expresso uses the Expresso Path Vector Protocol (EPVP) to simulate route computation. EVPP is a symbolic variant of routing protocols that can handle routes with symbolic prefixes and advertiser variables.
Data Plane: Expresso converts symbolic routes into symbolic forwarding rules, accounting for the longest prefix match (LPM) semantics, and generates symbolic FIBs.

Property Analysis: Expresso can analyze various routing and forwarding properties based on RECs and PECs, such as RouteLeakFree, RouteHijackFree, and TrafficHijackFree. RouteLeakFree ensures that routes received from a peer or provider are not leaked to another peer or provider. Expresso checks for this property by examining the symbolic RIB (Routing Information Base) at each external node, verifying whether there are any routes that originate from other internal nodes or itself. RouteHijackFree ensures that routes originating from a neighbor are not selected as the best routes for an internal prefix. Expresso assesses this by reviewing the symbolic RIB at each internal node, ensuring that if there are routes containing an internal prefix, these routes are not preferred over those that originated within the network. TrafficHijackFree ensures that traffic between internal nodes of a network only traverses the network’s own routers. Expresso checks for this by using PECs (Packet Equivalence Classes) to determine if any packets have a final state of EXIT and if their destination does not match the internal prefixes.

Evaluation
Expresso was applied to the WAN of a large cloud service provider and Internet2 to detect configurations that may lead to route leaks, route hijacks, and traffic hijacks. The performance of Expresso was measured in terms of its ability to identify property violations and its computational efficiency. The tool was compared with existing verifiers like Minesweeper and Bagpipe to demonstrate its effectiveness and scalability. Expresso discovered a significant number of configuration issues that could potentially lead to route leaks, route hijacks, and traffic hijacks, as shown below. Notably, some of these issues were confirmed and rectified by network operators, validating Expresso’s accuracy in identifying real-world problems. Expresso is, to the best of our knowledge, the only verifier that can check the correctness of WANs amidst arbitrary external routes within a tractable amount of time, while other verifiers time out after one day. Expresso’s innovative approach to network verification addresses the challenge of reasoning about the vast space of possible external routes, offering a more comprehensive and scalable solution compared to existing verifiers.

Q: l want to ask what is the max number of BDD variables. For example, if you have 10 prefixes and you have 10 external neighbors in your network, you may need to build 100 BDD variables. lf you consider a community, the number of BDD variables may go further?
A: Yeah. Firstly, you said, if you have 10 prefixes and you have 10 external neighbors in your network, we only have 38 variables to encode the prefixes, and then 10 to encode the neighbors. So that’s not 100. For the next question, you said about symbolic AS paths and AS communities, for the datasets we are using, the number won’t be in too much. In the experiment, for the control plane, we only need 38. And for the data plane, as l said, we have to enlarge the number of variables for each neighbor. But in practice, we found that only 8 variables for each external neighbor are enough.

Personal thoughts
Expresso introduces a groundbreaking approach to network verification by employing symbolic simulation, effectively tackling the vastness of external route spaces that existing tools struggle to manage. Its integration into network management processes could substantially enhance the reliability and security of networks. By identifying vulnerabilities such as route leaks and hijacks, Expresso empowers network operators to proactively fortify their networks against potential misconfigurations and external threats. Expresso’s practical utility is underscored by its successful application in the WAN of a large cloud service provider and the Internet2 network. It has proven its ability to process real-world network configurations and uncover potential misconfigurations. While Expresso demonstrates remarkable capabilities in diagnosis, the paper acknowledges limitations in handling certain BGP attributes and route dependencies. Moreover, the current version of Expresso does not include an automatic repair feature, which might lead to inconvenience in fully automated network operations. The absence of this feature means that, for now, Expresso serves as a diagnostic tool that requires human intervention for the resolution of identified issues.