OptimusPrime: Unleash Dataplane Programmability through a Transformable Architecture
Authors : ZhiKang Chen, Yong Feng, Shuxin Liu (Tsinghua University); Haoyu Song (Futurewei Technologies); Hanyi Zhou, Tong Yun, Wenquan Xu (Tsinghua University); Tian Pan (Purple Mountain Laboratories); Bin Liu (Tsinghua University)
Introduction
Thanks to their capability to support customizable functions, programmable network chips are becoming more and more popular with a burgeoning arsenal which enlists NVIDIA Spectrum, AMD Pensando, Juniper Trio , Broadcom Trident, and many others. The architectures of these chips can be categorized into three types: pipeline, multi-core Run-To-Completion (RTC), and the hybrid of the two. However, due to the intrinsic characteristics, each architecture exhibits certain limitations. Albeit with high throughput, a pipeline is awkward or even incapable in handling complex and stateful network tasks which involve long dependency chains, feedback loops, substantial computations, or large flow tables. RTC cores, on the other hand, are flexible enough for such tasks, but suffer from performance inefficiencies; limited by the die size and power cap, the number of cores accommodated in a single chip easily falls short of the network throughput demand. The hybrid architecture combines a pipeline and a number of RTC processors, trying to close the gaps of performance and capability. However, without knowing the application requirements in advance, the mechanical hardware juxtaposition can hardly achieve the optimal resource allocation and performance. It is easy to end up with the situation that while one type of resource is inadequate, the other type of resource is plethoric. Another architectural difficulty is the interconnection of the two types of resources. The existing approaches (e.g. shared memory) may either choke the faster path or debilitate the processing capability due to the fixed resource configuration and inefficient data transfer between the two. This paper is to design a better hybrid architecture which synergizes the benefits of the pipeline and multi-core RTC architectures, while ensuring efficient resource utilization under dynamic conditions, to achieve a versatile, one-size-fits-all solution.
Key idea and contribution :
This paper is to design a better hybrid architecture which supports coexistence of pipeline and RTC cores with transformable hardware blocks and allows flexible block partition between pipeline stages and RTC processors based on the optimal code mapping conducted by the compiler over an application. The major contributions of this paper are as follows:
The authors analyze the similarities and differences between the pipeline and multi-core RTC architectures. The finding encourages them to design OptimusPrime which is capable of supporting both architectures with common building blocks. Through switch configuration, each block can function as either a pipeline stage or a multi-core RTC processor. The transformation capability unleashes unprecedented adaptability and versatility for network dataplane programming.
The authors preserve the P4 programming paradigm and integrate C programming into the framework named P4X. The integration supports describing complex and stateful functions (which may be placed in RTC processors) in an overall pipeline context for packet processing. Our compiler algorithm is designed to map the program portions to either pipeline stages or RTC cores with the best possible performance. This approach advances the compiler design and allows for a simple and efficient development process.
Evaluation
The authors conduct software-based ASIC simulations and realize an FPGA-based prototype to validate the architecture. OptimusPrime uses less than 6% additional area compared to the programmable pipeline of the same specification. They implement and analyze three aforementioned use cases on OptimusPrime: machine learning parameter aggregation, in-network caching, and network function integration. They all present much superior performance and flexibility.
Q : Is it possible to achieve runtime transformable block?
A : We can’t support that, but I think it’s a good direction for our research in the future.
Personal thoughts
OptimusPrime allows switches and smartNICs to expand their programmability potential to a new level. OptimusPrime retains high performance while endowing the programmable pipeline with richer packet processing capabilities. It also continues the P4-based programming paradigm and achieves good performance in various network applications.