Paper : NegotiaToR: Towards A Simple Yet Effective On-demand Reconfigurable Datacenter Network
Authors : Cong Liang, Xiangli Song, Jing Cheng, Mowei Wang, Yashe Liu,
Zhenhua Liu, Shizhen Zhao, Yong Cuil
Presenter : Shaofeng Wu, The Chinese University of Hong Kong
Guests of Honor : Cong Liang, Qinghua University
Q: How does the reliability of optical switching devices compare to traditional electrical switching? Are there unique failures specific to optical switching?
A: Optical transceivers are less reliable than cables but offer higher bandwidth, lower cost, and lower power consumption. Failure-handling mechanisms are integrated (e.g., dynamic reconfiguration to bypass broken paths). Future reliability improvements will enhance systems like NegotiaToR.
Q: Why do electrical switches face capacity limitations post-Moore’s Law?
A: Moore’s Law slowdown affects CMOS-based switching chips, making it challenging to increase switching capacity (e.g., 51.2 Tbps forwarding rate) within power/die area constraints. This is a switching limitation, not a link bandwidth issue.
Q: What challenges arise when using optical switching for AI training networks (e.g., AllReduce communication)?
A: AI workloads are predictable, enabling easier adoption for core switches and failure handling. NegotiaToR leverages nanosecond reconfiguration for efficient recovery (as in NVIDIA’s work), and optical switching provides higher bandwidth opportunities.
Q: Will fault tolerance mechanisms differ in optical switching vs. traditional architectures? A: Yes. In NegotiaToR, a predefined phase detects failures (nodes identify missing messages), enabling dynamic link adjustments. This improves failure handling but may impact load balancing; software solutions like UCMP path selection mitigate this.
Q: Are current optical switches fixed-function? Can they be programmable like P4 switches?
A: Current optical switches handle basic connections. Future versions may integrate optical computation (e.g., arithmetic via light properties) for in-network processing (like ML acceleration), but programmability details (e.g., latency) are in papers like Lightning.
Q: Does NegotiaToR have scalability bottlenecks? If so, where?
A: The bottleneck is the predefined phase: As TOR switches increase, the phase duration grows. Solutions include adding more optical switches to shorten the phase or adopting group-based methods like SHIELD.
Q: Can NegotiaToR explicitly separate mice/elephant flows like RotorNet/Opera?
A: No. NegotiaToR uses fast-switching hardware (10ns reconfiguration) to handle all flows with a single mechanism/network, reducing operational costs. Flow separation is unnecessary.
Q: If we pipeline the request-grant-accept process, what performance gain is expected?
A: Pipelining improves throughput by eliminating idle time between phases but does not reduce scheduling latency.
Q: How to transition from traditional DCN to optical switching-based DCN?
A: Gradual replacement (e.g., Google’s Jupiter Evolving), hybrid networks (combining optical/electrical switches), and traffic steering. Engineering effort is significant but feasible.