Title : Relational Network Verification
Authors: Xieyang Xu, Arvind Krishnamurthy, Ratul Mahajan (University of Washington); Yifei Yuan, Ennan Zhai (Alibaba Cloud); Zachary Kincaid, David Walker (Princeton University)
Scribe: Hanyang Shao
Introduction
Network changes are among the most risky activities in network management today, often leading to unintended outages with severe consequences such as banks going offline, airlines ceasing operations, and emergency services becoming unreachable, causing businesses to lose millions of dollars. While network verification tools that analyze a single snapshot of a network have made progress, they are insufficient for ensuring network reliability during updates. Specifically speaking, traditional tools analyze specifications for a single network configuration but not changes between two snapshots (e.g., pre-change and post-change states). Thus do not effectively capture the relationships and differences between two network states. They are not scalable and cannot handle network incremental changes. Due to the limitations of existing tools, network engineers often have to rely on manual inspection to validate changes, which is time-consuming, error-prone, and not feasible for complex or large-scale networks. The paper addresses the challenge of safely implementing changes in a network environment. Specifically, it tackles the problem of how to validate that a network change, such as a router configuration update, does not inadvertently affect the network’s behavior in unwanted ways. The difficulty lies in ensuring that only the intended traffic paths are altered while all other aspects of the network remain functional and secure, without causing outages or introducing vulnerabilities. The authors introduce the concept of relational network verification and the Rela tool, designed to address these challenges by focusing on the relationships between pre- and post-change network states.
Key idea and contribution
Relational network verification is a novel approach that analyzes specifications related to two network snapshots (e.g., before and after a change), capturing their similarities and differences. This approach leads to compact and precise change specifications that mandate “no change” for the network’s behavior, except for the flows or paths that are explicitly specified to change, without needing to enumerate all other behaviors. This is in contrast to single-snapshot specifications that must list all behaviors not expected to change, making them impractical for many real-world networks.
The concept of relational network verification contrasts with traditional verification by focusing on the relationships and differences between two network snapshots rather than a single snapshot. This methodology simplifies the specification process, as it requires specifying only the changes between snapshots and ensuring the rest of the network behavior remains unchanged. This is more efficient compared to traditional methods where all network behaviors must be specified for a single snapshot, which is often impractical for large networks. The authors designed a high-level relational specification language (Relational Specification Language) that allows users to define changes succinctly. The syntax of Rela’s front-end language is shown below. This language helps in specifying the flows or paths that are expected to change between the two snapshots. Rela compiles the relational specifications and network snapshot representations into finite state transducers (FSTs). FSTs are then used to model the behavior of the network before and after changes. Finally, Rela uses decision procedures for automaton equivalence to check if the relational specifications hold true across the network snapshots. This method ensures that the network’s desired properties are maintained after changes.
Evaluation
The evaluation of Rela, the relational network verification tool, was conducted on a set of real network changes in the global backbone of a large cloud provider. The evaluation was divided into two main parts: expressiveness and performance. For expressiveness, the authors used Rela to specify the engineers’ intent for each change in the dataset by examining change tickets, which include descriptions in natural language and implementation plans. The focus was on data plane change intents, with the majority of changes involving only data plane modifications. For performance, the authors measured the time required for Rela to validate changes, considering the size of the specifications (number of atomic specs) and location granularity (router group, router, interface). They compared this with the time required for manual inspection, which includes simulating network snapshots and computing forwarding paths. Rela was compared to differential network analysis (DNA) and Batfish, both of which analyze pre- and post-change snapshots but rely on manual inspection to certify correctness. In contrast, Rela’s specifications automatically determine the validity of network changes without human intervention, providing more precise and actionable insights. The evaluation result is shown below. It showed that Rela could specify the intended data plane changes for 97% of the changes in the dataset, demonstrating its high expressiveness. For the remaining 3%, Rela could only partially specify the changes due to its current limitation in supporting path counting. Performance-wise, the validation time with Rela was found to be acceptable for the backbone network, especially when compared to manual inspection times. The two key factors affecting performance were the size of the specifications and the granularity of the analysis. Validation time increased with the size of the specs and was significantly longer for finer-grained analysis, such as at the interface level.
Q1: How well does it scale when you look at like, internet prefixes. And in the example, the setting is a single exit. But things could be more complex, which might be ECMP, across different peering points, or have multiple exits, etc. And what does the system scale to meet that kind of problem?
A1: This tool is verified on the data plane. So as a prefix, this is handled by the simulator called Hoyan. So if you’re interested in the scale of Hoyan, l can say that can scale to millions of prefixes. And the number of paths could also be scaled to millions.
Q2: One question l was wondering about was one way to get people to adopt verification tools is not to do any additional work. And here you’re still actually asking them to specify an intent. But l thought the direction would go on would be to extract intent from a change and then show it to them. And they’re willing to look at something they might not be willing to write something. Is that something you can do?
A2: Yeah, that’s a very interesting direction. Actually, we are pursuing that too. So.but at this stage, we offer other ways for the operator to specify the change intent. What l present here is a specification language, but l can imagine l would provide a web UI for the operator to use the graphics to compose those intense or we can provide templates, so the operator only needs to give some parameter values to a compose intent.
Personal thoughts
From the perspective of the evaluation result, it is highlighted that while performance is influenced by the size of the specifications and the granularity of analysis, Rela’s ability to handle fine-grained analysis without substantial performance degradation is noteworthy. Relational network verification offers a fresh perspective by focusing on the parts of the network that are affected by changes rather than the entire network. The implementation of the Rela tool substantiates the effectiveness of this approach. However, it may require network engineers to learn a new specification language and initially invest additional effort in precisely defining change specifications. Future work could explore simplifying the specification process and extending the application of relational network verification to other network verification and synthesis problems.