Title: Software-based Live Migration for RDMA
Authors: Xiaoyu Li (Tsinghua University, Microsoft Research), Ran Shu (Microsoft Research), Yongqiang Xiong (Microsoft Research), Fengyuan Ren (Tsinghua University)
Introduction
Live migration is a critical technology in modern data centers, allowing for server maintenance and load balancing without service interruption. At the same time, Remote Direct Memory Access (RDMA) has been widely deployed to accelerate distributed applications by offering high throughput and low latency. However, despite the prevalence of both technologies, live migration for RDMA-enabled applications is not supported in today’s data centers. This is a significant problem because the state for RDMA communications is managed by the hardware (RNICs) and is not externally accessible, making it impossible to migrate to another server’s RNIC. While hardware modifications have been proposed, they are not available on commodity RNICs. This paper proposes MigrRDMA, a software-based live migration system for RDMA applications that works on commodity RNICs by introducing a software indirection layer to enable communication pre-setup, provide efficient virtualization of communication states, and handle in-flight request consistency.
Key idea and contribution
The authors built MigrRDMA, a system that enables live migration for RDMA-enabled applications purely through software, without requiring any special hardware support. The core idea is to introduce a software indirection layer within the RDMA driver. This layer intercepts control path calls to maintain the minimal state required to rebuild RDMA communications on the destination server. Unlike previous RDMA virtualization work focused on sharing and isolation, MigrRDMA’s indirection layer is specifically designed to hide the differences between the old and new RDMA connections from the application’s perspective.
MigrRDMA’s main contribution is a set of three novel mechanisms to overcome the challenges of software-based RDMA migration. First, it enables pre-setup of RDMA communications during the memory pre-copy phase. It achieves this by ensuring RDMA-related memory structures are mapped to their original virtual addresses on the destination early in the process, allowing memory registration to proceed in parallel with memory content transfer. Second, it provides efficient virtualization of RDMA states (like Queue Pair Numbers and access keys) by maintaining translation tables between virtual and physical values, with a focus on low-overhead lookups in the data path. Third, it introduces a “wait-before-stop” mechanism to ensure the consistency of in-flight work requests. When a migration is initiated, MigrRDMA suspends new requests and waits for all active requests to complete before the final state transfer, preserving RDMA’s asynchronous semantics for the application.
Evaluation
The evaluation, performed on a testbed with Mellanox ConnectX-5 RNICs, demonstrates that MigrRDMA is both effective and efficient. The RDMA pre-setup feature was shown to significantly reduce migration blackout time by up to 58% compared to a workflow without it, especially as the number of RDMA Queue Pairs increased. The “wait-before-stop” mechanism was shown to add minimal overhead relative to the total migration time. The software virtualization layer itself was found to be lightweight, adding only 3-9% overhead (4.6 to 8.3 extra CPU cycles) to data path operations. This result is significant because it proves that a purely software-based solution can enable live migration for high-performance RDMA applications with negligible performance impact on the application itself and minimal downtime, making it a readily deployable solution for modern data centers. When migrating a real-world, RDMA-based Hadoop task, MigrRDMA resulted in only a 12.5% throughput loss and added just 3 seconds to the job completion time, a vast improvement over the application’s native failover mechanism.
QA
Q1: An audience member praised the work for its ingenuity in enabling RDMA live migration on current hardware. They then asked about the future of the solution: is this software-based approach expected to be a long-term solution (“for many generations”), or is it anticipated that hardware will eventually change to natively support these features more easily?
A1: The speaker explained that they expect the solution to remain primarily software-based and not rely heavily on hardware. They reasoned that implementing such complex features directly in hardware requires a significant amount of engineering effort. The goal of their work was to explore how much could be accomplished purely in software. While they acknowledged that some future live migration features might require hardware support, they believe that because of their software-first approach, any necessary hardware component would be minimal and “not become very heavy.”
Personal thoughts
The most compelling aspect of this paper is the powerful abstraction it provides. By introducing a software indirection layer, MigrRDMA makes the entire live migration process completely transparent to the application. This is a significant achievement because it shields the application developer from the complex, hardware-specific details of RDMA state. An application can leverage the performance benefits of RDMA without sacrificing the operational flexibility of live migration, which is crucial for modern data center management.
