Paper : RedTE: Mitigating Subsecond Traffic Bursts with Real-time and Distributed Traffic Engineering
Authors : Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min, Yi Wang.
Presenter : Mengrui Zhang, Xiamen University.
Guest of Honor : Feigui, Qinghua University. Kaihui Gao, Zhongguancun Laboratory.
Q: Why add the two components of the reward function linearly? Why not multiply them or take a logarithmic form?
A (Kaihui):
- Many experiments were done; the linear form balances utilization and number of updated entries.
- Alpha is a tunable parameter depending on the user’s concern (e.g., utilization vs. update latency).
- Linear combination is simple and intuitive, though other combinations may yield better performance.
Qiao: That makes sense. Simplicity is key, but exploring more complex combinations could be worthwhile.
Q: How is the number of updated entries calculated? Do you follow methods like those in the z-update paper for disruption-free updates and minimal updates?
A (Kaihui):
- There is a fixed-size rule table.
- By comparing the current action with previous RL actions, the system identifies which entries have changed.
- Fewer changes mean better actions from the agent.
A (Fei):
- The number of updated entries correlates linearly with the change in traffic split ratios between consecutive time periods.
- For example, if the split changes from 20/80 to 10/30/60, and if each split is scaled to 100 terms, a 10% change means 10 entries updated.
- This is calculated across the whole network, not just a single router.
- The final reward function uses a max operation over all routers.
Q: Is the total number of updated entries calculated across the entire network or per-router?
A:
It is calculated network-wide, and the reward function takes the max value across all routers.
Q: I’m still a little fuzzy about the first design. How do you enable cooperation or collaboration between different routing agents? Is there a routing decision involved, and if so, how do you guarantee consistency?
A:
To avoid routing loops, we use fixed forwarding paths. Our agent selects from pre-computed end-to-end paths, which guarantees there are no routing loops. These paths can be implemented using Segment Routing (SR), MPLS, or SRv6.
Q: Did you use source routing like in the Decentralized SDN paper from Google and Berkeley presented at SIGCOMM 2024?
A:
Yes, we use source routing. The approach is similar in essence, although there may be differences in details.
Q: How do you handle topology or routing changes, like link down, link up, or the addition of new routers? If I update the set of pre-configured paths, do you need to retrain the model?
A:
- We can handle scenarios where several links fail, but we cannot handle cases where new routers or links are added.
- When a link is down, the path is considered very congested (utilization > 100%), and the model automatically adjusts traffic separation.
- In such scenarios, we don’t necessarily retrain the model from scratch, but there are limitations to what topological changes we can handle.
Q: The open-source code seems unmanaged or not aligned with the article. Do you still maintain or update it?
A:
We only open-sourced part of the code because the project is a collaboration with another organization that restricts full open-sourcing due to privacy and NDA agreements.
Q: In your learning process, are the state transitions consistent with standard reinforcement learning, or is there a unique design?
A:
Yes, there is a challenge. In traffic engineering, the environment is dynamic and traffic is continuously injected — making it an input-driven environment.
- State transitions are affected both by the agent’s actions and incoming traffic.
- As traffic arrival is random, the reward can also be random — sometimes good actions receive bad rewards.
- This makes training slower and harder to converge.
- We proposed a “circuit node traffic matrix replay” method to mitigate this and stabilize training.
Q: Are the states in your model connected? How does one action influence the next state? Shouldn’t the traffic matrix arrive first and then action be taken?
A:
There are two reinforcement learning types: closed-loop and input-driven.
- Our environment is input-driven, as traffic arrival affects state transitions.
- We replay traffic matrices during training so agents experience similar patterns and learn to make better decisions.
- You can look into “input-driven reinforcement learning” for deeper understanding.
Q: Is there a central system in your deployment that collects information and interacts with all agents? How does the global critic model work?
A:
No, in the deployment phase, there is no central system.
- Agents are fully distributed.
- The global critic model is only used during the training phase and not in deployment.
Q: What’s the next step or major challenge for RedTE or distributed traffic engineering in general?
A:
We see potential improvements in:
-
Failure tolerance
-
More flexible routing decisions
However, increasing flexibility often introduces risks like routing loops or inconsistency, so it’s a trade-off.
-
A promising direction is combining reinforcement learning with formal verification — if agents output illegal actions, they can be penalized with bad rewards.
-
Formal verification could play an important role in future designs.