PPT: A Pragmatic Transport for Datacenters

JinghuiJiang · July 30, 2024, 11:45am

Title: PPT: A Pragmatic Transport for Datacenters

Authors: Lide Suo, Yiren Pang (Tianjin University); Wenxin Li (Tianjin University & Huaxiahaorui Technology (Tianjin) Co., Ltd.); Renjie Pei, Keqiu Li, Xiulong Liu, Xin He, Yitao Hu (Tianjin University); Guyue Liu (Peking University)

Speaker: Lide Suo
Scribe: Jinghui Jiang
Introduction
This paper presents PPT, a pragmatic transport that achieves comparable performance to proactive transports while maintaining good deployability as reactive transports.

The key idea of PPT is to run a low-priority control loop to leverage the available bandwidth left by the reactive transports. PPT combines two unconventional techniques: intermittent loop initialization and exponential window decrease, enabling PPT to dynamically identify and fill the spare bandwidth. PPT further complements its design with a buffer-aware flow scheduling scheme to optimize the average FCT of small flows without prior knowledge of flow size information.

Key idea and contribution:
The core contribution of PPT is divided into two parts. The first part is the dual-loop rate control. The dual-loop rate control is divided into high-priority control loop (HCP) and low-priority control loop (LCP). HCP is a traditional DCTCP protocol, so HCP also has a sawtooth-shaped traffic graph. In order to better utilize the remaining effective bandwidth of HCP, PPT adds LCP on the basis of HCP.

LCP uses the intermittent loop initializing mechanism to identify the maximum available bandwidth left by HCP and sets LCP to this bandwidth. After that, LCP uses the exponential window decreasing strategy to transmit data to avoid the impact of LCP traffic on HCP traffic. Specifically, in the exponential window decreasing strategy, the receiver sends an Ack packet to the sender for every two packets received; the sender continues to send data based on the number of Acks received, thereby achieving exponential window decreasing. If congestion occurs, the receiver will transmit ECN to the sender to allow the sender to send packets at a slower rate.

The second part is the buffer-aware flow scheduling scheme. In order to prevent small flows from being blocked by large flows, PPT needs to identify large and small flows and give small flows a higher priority. PPT identifies large flows based on the size of the data passed into the buffer. PPT sets a threshold for the data that is first passed into the buffer. Data larger than the threshold is large flow and has a lower priority; data smaller than the threshold is small flow and has a higher priority.

Evaluation
Compared to the reactive transport—RC3, PPT reduces the overall average FCT and the average/tail FCT of small flows by up to 92.7% and 99.2%/99.9%, respectively. Compared to Homa, PPT reduces the overall average FCT by up to 46.3% and shows a 25%/55.5% lower average/tail FCT of small flows in an Mem-cached workload.

Q&A：
Q1: Why do you ignore ECN?

A1: PPT does not ignore ECN. PPT uses ECN when there is congestion. LCP uses exponential window decreasing to avoid affecting HCP traffic. But there are still some situations where congestion occurs. When congestion occurs, PPT uses ECN to let the sender send data at a rate slower than the exponential window decreasing.

Q2: RDMA is widely used in data centers. Can PPT be used in RNIC?
A2: PPT is implemented based on the kernel and can only be used for TCP-based transmission solutions.

Personal Thoughts
PPT is an interesting work, which uses LCP to fill the idle bandwidth left by HCP based on DCTCP. Using PPT helps to achieve higher bandwidth utilization during data transmission. However, PPT’s kernel-based solution limits the use of PPT in RDMA. How to make PPT work in RDMA scenarios may be something the author needs to study next.