Nezha: SmartNIC-based Virtual Switch Load Sharing

Title: Nezha: SmartNIC-based Virtual Switch Load Sharing

Authors: Xing Li (Zhejiang University and Alibaba Cloud); Enge Song, Bowen Yang, Tian Pan, Ye Yang (Alibaba Cloud); Qiang Fu (School of Computing Technologies, RMIT University); Yang Song, Yilong Lv, Zikang Chen, Jianyuan Lu, Shize Zhang, Xiaoqing Sun, Rong Wen, Xionglie Wei (Alibaba Cloud); Biao Lyu (Zhejiang University and Alibaba Cloud); Zhigang Zong (Alibaba Cloud); Qinming He (Zhejiang University); Shunmin Zhu (Hangzhou Feitian Cloud and Alibaba Group)

Introduction

This paper addresses the tension in cloud networks between SmartNIC-based virtual switches (vSwitches) that are often underutilized on average but occasionally overloaded by a few high-demand VMs. Existing fixes are unsatisfactory: migrating heavy VMs incurs downtime, upgrading every server with bigger NICs is costly and unnecessary (overloads are rare), and even shared pools (e.g. Sirius) require new devices and in-line state replication that halves performance. The paper tackles this by enabling SmartNICs to share their unused resources: when one vSwitch is overloaded, its heavy vNICs can transparently offload packet processing to idle SmartNICs elsewhere, without adding new hardware or synchronizing state.

Key idea and contribution

The core idea of Nezha is to reuse idle SmartNICs on other hosts as a remote resource pool for overloaded vNICs, without deploying any new hardware. When a local vSwitch gets congested, Nezha transparently offloads its heavy vNICs to one or more front-end SmartNICs elsewhere. Crucially, Nezha decouples stateless rule/flow tables from flow state. It moves only the stateless tables to the remote FEs while keeping all flow state on the original SmartNIC. This eliminates any need for continuous state synchronization between nodes. Packets carry context (“in-packet” metadata) so that either the local or remote SmartNIC can complete processing correctly.

Evaluation

The evaluation of Nezha demonstrates its significant performance gains in both a controlled testbed and a large-scale production environment on Alibaba Cloud.

In the testbed, Nezha increased CPS by up to 3.3x and the number of concurrent flows by 3.8x using four remote Frontends. It effectively shifted the performance bottleneck from the local SmartNIC to the VM’s CPU, reducing vSwitch CPU utilization from 70% to 10%. The additional network hop introduced by offloading added a negligible latency of under 10μs, which is insignificant compared to typical cloud application latencies.

In production, Nezha was deployed on critical cloud middleboxes—a load balancer, a NAT gateway, and a transit router. It achieved impressive improvements: 3x-4.4x in CPS, 5x-50x in concurrent flows, and over 40x in the number of supported vNICs.

Crucially, Nezha achieves this with minimal deployment cost. The software development cost was only 15 person-months (a fraction of the 148 person-months required for a solution like Sailfish) as it modified less than 5% of the existing vSwitch code.

Q&A

Q1: Do you require the routing table modification in your work? For example, if there is a flow, we should target for node A. And then this flow is selected to be uploaded to the processing in node B. So will you first load A receive the flow forward to node B for the further processing, and then transmit it back to node A? Or you will change the routing table to direct forward that flow to node B?
A1: If the config is not installed or the routing table is not installed to the source switches the packet may directly send it to the back end, not the front end. If the packet is routed to the back end, it will be processed at the back end. If the packet is routed to the front end, it will be processed in the front end. But there is not a routed from back end to front end.

Q2: I was curious about the reliability aspects, especially for the heavy hitting vNICs, which are now using multiple physical SmartNICs. It looks like it will actually decrease the reliability from a viewing perspective.
A2: Decouple the logic into front-end and back-end indeed increases the challenges in reliability. But we do active failover. If one of the front end is down, the traffic will be routed to the other three front ends. But it indeed takes two seconds to failover in our system.

Personal thoughts

Nezha offers an elegant and practical solution to cloud network bottlenecks by repurposing idle SmartNICs to handle traffic hotspots, using a clever decoupling of stateless rules (offloaded remotely) from stateful processing (kept locally) to avoid complex synchronization. However, the approach of carrying extra metadata in each packet introduces additional pipeline complexity and potential processing overhead, despite latency tests showing minimal impact in practice.