Shale: A Practical, Scalable Oblivious Reconfigurable Network

Gao_Han · July 30, 2024, 11:44am

Title: Shale: A Practical, Scalable Oblivious Reconfigurable Network

Authors: Daniel Amir, Nitika Saran, Tegan Wilson, Robert Kleinberg (Cornell University); Vishal Shrivastav (Purdue University); Hakim Weatherspoon (Cornell University)

Scribe: Gao Han (Xiamen University)

Introduction

As network demands increase, power consumption and scalability issues are becoming more significant, leading to increasing limitations of packet switches. Traditional ORN (Oblivious Reconfigurable Network) designs are unsuitable for large-scale deployments because their latency is linearly related to system size. Existing systems either focus on maximizing throughput at the cost of latency or are not designed to effectively handle the nanosecond reconfiguration times of modern circuit switches. This paper addresses the challenges of using circuit-switching technology to efficiently handle high-throughput and low-latency traffic in data center networks.

Key idea and contribution

This paper proposes Shale, a generalization of existing SRRD ORN designs to achieve a tunable tradeoff between throughput and latency scaling. Shale interleaves multiple Pareto optimal schedules in parallel, allowing both latency- and throughput-sensitive flows to achieve optimal performance. Shale uses a generalized round-robin schedule with multiple shorter round-robins, which reduces the latency. Besides, it proposes new congestion control mechanisms, specifically the hop-by-hop and spray-short algorithms, which work together to ensure efficient traffic management and optimal network utilization.

To demonstrate the feasibility and scalability of Shale, this paper implements an FPGA-based prototype. This prototype showcased Shale’s ability to scale to data center-sized networks while using significantly fewer hardware resources compared to existing ORN designs.

Evaluation

The evaluation of Shale was conducted through extensive simulations with networks comprising up to 10,000 nodes, showcasing its scalability and efficiency. Shale demonstrated substantial improvements in performance, memory usage, and resilience to failures compared to existing oblivious reconfigurable network designs. Shale’s mechanisms achieve close to theoretical throughput and latency guarantees, while also achieving up to 13× better tail latency and 20× better tail buffer occupancy than state-of-the-art congestion control protocols such as NDP. This result is significant because it provides a practical and scalable solution for the growing demands of modern data centers, enabling them to handle increasing bandwidth requirements.

Q1: If all hops are of equal length how to deal with reordering if they are not?

A1: All the hops, we model it as having the same propagation delay. We have thought a little bit about how you could deal with different propagation delays, but we don’t talk about it in this work. We address reordering by using a reorder buffer at the destination.

Q2: How do you deal with the scenario where there are multiple destinations and a node sends multiple flows to multiple destinations?

A2: In simulations, a round-robin approach is used between the outgoing flows. The position of the destination nodes is considered, and traffic may be sent to certain nodes for shorter paths.

Q3: Do you consider the memory consumption of the real buffer in your evaluation?

A3: We don’t consider it in this evaluation but is being considered for future work.

Personal thoughts
This paper presents an innovative and practical approach to meet the growing demands of data center networks. The paper focuses on optical path switches, which offer a promising alternative to the increasingly inefficient traditional packet switches.