Title : Transferable Neural WAN TE for Changing Topologies
Speaker : Abd AlRhman AlQiam (Purdue University)
Scribe : Siyong Huang (Xiamen University)
Authors : Abd AlRhman AlQiam (Purdue University); Yuanjun Yao, Zhaodong Wang (Meta); Satyajeet Singh Ahuja (Meta Platforms, Inc); Ying Zhang (Meta); Sanjay G. Rao, Bruno Ribeiro, Mohit Tawarmalani (Purdue University)
Introduction :
The dynamic nature of Wide Area Networks (WANs) necessitates robust Traffic Engineering (TE) solutions. Traditional TE methods relying on linear programming (LP) solvers face limitations, particularly in coping with prediction errors and scaling issues. Neural models have emerged as a promising alternative, offering better scalability and adaptability. However, existing neural TE models struggle with dynamic topologies, failing to generalize beyond the static scenarios they were trained on.
Key idea and contribution :
The key insight of this work is the recognition that existing neural traffic engineering models are limited by their inability to generalize to changing network topologies and to handle variations in tunnel configurations. This challenge arises because these models do not account for input transformations and lack alignment with optimization processes.
To overcome this, the authors propose HARP, a neural TE scheme that directly addresses these limitations. HARP’s design focuses on two main principles: preserving invariances to input transformations, such as tunnel reordering, and aligning its architecture with optimization models. By using graph neural networks and set transformers to model topologies and tunnels, and incorporating a recurrent adjustment unit for iterative refinement, HARP effectively adapts to changing network structures and prediction inaccuracies. This approach allows HARP to outperform traditional LP solvers and existing neural models, offering more robust and scalable traffic routing in dynamic WAN environments.
Evaluation :
HARP’s performance was rigorously evaluated using a combination of real-world and synthetic datasets, including AnonNet, GEANT, Abilene, and KDL. These evaluations highlighted HARP’s ability to maintain low maximum link utilization (MLU) even in the face of significant network changes. For instance, when tested on dynamic conditions involving changing topologies and traffic patterns, HARP remained within 20% of the optimal MLU 98% of the time. This performance is particularly noteworthy when compared to baselines like DOTE and TEAL, with HARP achieving an average MLU that is three times lower than that of DOTE. Furthermore, HARP outperformed the traditional LP solver Gurobi, demonstrating a 9% improvement on the 80th percentile when using predicted traffic matrices. In additional tests, HARP showed robustness to input transformations, maintaining its routing efficiency even when tunnels were reordered, a scenario where DOTE and TEAL saw a significant drop in performance. These results underscore HARP’s effectiveness in dynamic and unpredictable network environments, proving its superiority over both traditional and contemporary neural TE models.
Questions and opinions :
Q1: Can you comment on the latency of HARP in handling topology changes, such as network failures, and how long it takes to recompute routing decisions?
A1: In the largest experiments conducted, specifically with the KDL topology comprising approximately 800 nodes and 1800 links, HARP required an average of 3.2 seconds to finalize routing decisions. In contrast, the LP solver Gurobi needed around 300 seconds for similar computations. This demonstrates HARP’s efficiency in adapting to network changes.
Q2: There are concerns regarding the comparison between HARP and traditional LP solvers like Gurobi. Can you elaborate on the nature of the problem you are solving and how the comparison with Gurobi is justified?
A2: The comparison with Gurobi is based on using predicted traffic matrices to generate routing decisions and then applying these decisions to actual traffic conditions. While Gurobi was designed to handle optimal routing for exact traffic matrices, HARP addresses prediction errors and adapts to dynamic topologies more effectively. The LP solver’s performance might be impacted by inaccuracies in traffic matrix predictions, whereas HARP is trained to handle such predictions and generalize beyond its training set. The focus is on evaluating how well each approach manages prediction errors and topology changes rather than the exact constraints or objectives of the optimization problem.
Personal thoughts :
I find the approach taken by the authors to be highly innovative and practical. One aspect I particularly appreciate is the use of tunnel embeddings to handle invariances and topological changes. By modeling tunnels as sets of edges and using set transformers, HARP can effectively maintain consistency in routing decisions despite transformations like tunnel reordering. This clever design choice helps to ensure that the model’s performance remains robust across varying network configurations, a significant advancement over existing neural TE models.
However, there are potential challenges and difficulties in real-world deployment that need to be considered. Integrating HARP into existing network infrastructure may require substantial effort, especially if current systems rely heavily on traditional LP solvers. Transitioning to a neural TE approach would involve not only technical adjustments but also changes in operational procedures and staff training. Additionally, the response times for HARP to adapt to rapid topology changes in real-time scenarios could be critical, especially in networks with stringent latency requirements. Ensuring that the model can consistently deliver low-latency solutions and efficiently handle large-scale network environments are important aspects to explore further. Furthermore, continuous monitoring and updating of the model may be necessary to maintain its accuracy and effectiveness as network conditions evolve.