EP8: RedTE: Mitigating Subsecond Traffic Bursts with Real-time and Distributed Traffic Engineering, Mar. 26, 2025

Paper : RedTE: Mitigating Subsecond Traffic Bursts with Real-time and Distributed Traffic Engineering
Authors : Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min, Yi Wang.
Presenter : Mengrui Zhang, Xiamen University.
Guest of Honor : Feigui, Qinghua University. Kaihui Gao, Zhongguancun Laboratory.

Q: Why add the two components of the reward function linearly? Why not multiply them or take a logarithmic form?

A (Kaihui):

  • Many experiments were done; the linear form balances utilization and number of updated entries.
  • Alpha is a tunable parameter depending on the user’s concern (e.g., utilization vs. update latency).
  • Linear combination is simple and intuitive, though other combinations may yield better performance.

Qiao: That makes sense. Simplicity is key, but exploring more complex combinations could be worthwhile.


Q: How is the number of updated entries calculated? Do you follow methods like those in the z-update paper for disruption-free updates and minimal updates?

A (Kaihui):

  • There is a fixed-size rule table.
  • By comparing the current action with previous RL actions, the system identifies which entries have changed.
  • Fewer changes mean better actions from the agent.

A (Fei):

  • The number of updated entries correlates linearly with the change in traffic split ratios between consecutive time periods.
  • For example, if the split changes from 20/80 to 10/30/60, and if each split is scaled to 100 terms, a 10% change means 10 entries updated.
  • This is calculated across the whole network, not just a single router.
  • The final reward function uses a max operation over all routers.

Q: Is the total number of updated entries calculated across the entire network or per-router?

A:

It is calculated network-wide, and the reward function takes the max value across all routers.


Q: I’m still a little fuzzy about the first design. How do you enable cooperation or collaboration between different routing agents? Is there a routing decision involved, and if so, how do you guarantee consistency?

A:

To avoid routing loops, we use fixed forwarding paths. Our agent selects from pre-computed end-to-end paths, which guarantees there are no routing loops. These paths can be implemented using Segment Routing (SR), MPLS, or SRv6.


Q: Did you use source routing like in the Decentralized SDN paper from Google and Berkeley presented at SIGCOMM 2024?

A:

Yes, we use source routing. The approach is similar in essence, although there may be differences in details.


Q: How do you handle topology or routing changes, like link down, link up, or the addition of new routers? If I update the set of pre-configured paths, do you need to retrain the model?

A:

  • We can handle scenarios where several links fail, but we cannot handle cases where new routers or links are added.
  • When a link is down, the path is considered very congested (utilization > 100%), and the model automatically adjusts traffic separation.
  • In such scenarios, we don’t necessarily retrain the model from scratch, but there are limitations to what topological changes we can handle.

Q: The open-source code seems unmanaged or not aligned with the article. Do you still maintain or update it?

A:

We only open-sourced part of the code because the project is a collaboration with another organization that restricts full open-sourcing due to privacy and NDA agreements.


Q: In your learning process, are the state transitions consistent with standard reinforcement learning, or is there a unique design?

A:

Yes, there is a challenge. In traffic engineering, the environment is dynamic and traffic is continuously injected — making it an input-driven environment.

  • State transitions are affected both by the agent’s actions and incoming traffic.
  • As traffic arrival is random, the reward can also be random — sometimes good actions receive bad rewards.
  • This makes training slower and harder to converge.
  • We proposed a “circuit node traffic matrix replay” method to mitigate this and stabilize training.

Q: Are the states in your model connected? How does one action influence the next state? Shouldn’t the traffic matrix arrive first and then action be taken?

A:

There are two reinforcement learning types: closed-loop and input-driven.

  • Our environment is input-driven, as traffic arrival affects state transitions.
  • We replay traffic matrices during training so agents experience similar patterns and learn to make better decisions.
  • You can look into “input-driven reinforcement learning” for deeper understanding.

Q: Is there a central system in your deployment that collects information and interacts with all agents? How does the global critic model work?

A:

No, in the deployment phase, there is no central system.

  • Agents are fully distributed.
  • The global critic model is only used during the training phase and not in deployment.

Q: What’s the next step or major challenge for RedTE or distributed traffic engineering in general?

A:

We see potential improvements in:

  • Failure tolerance

  • More flexible routing decisions

    However, increasing flexibility often introduces risks like routing loops or inconsistency, so it’s a trade-off.

  • A promising direction is combining reinforcement learning with formal verification — if agents output illegal actions, they can be penalized with bad rewards.

  • Formal verification could play an important role in future designs.