P4runpro: Enabling Runtime Programmability for RMT Programmable Switches

Title: P4runpro: Enabling Runtime Programmability for RMT Programmable Switches

Authors: Yifan Yang, Lin He (Tsinghua University); Jiasheng Zhou (Fuzhou University); Xiaoyi Shi (Tsinghua University); Jiamin Cao (Alibaba Cloud); Ying Liu (Tsinghua University)

Scribe: Chengjin Zhou (Nankai University)

Introduction
Programmable switches have enhanced many network functions by allowing operators to define flexible protocols and processing logic. However, due to P4’s limited level of abstraction (i.e., it only defines a single data plane context), current programmable switches can only specify data plane programs at compile time and are unable to update the switch data plane during running time . When a running program needs to be changed, the operator has to re-provision the switch, causing disruptions to traffic and suspending unrelated switch programs. Existing works that solve runtime update problems are classified into three categories. The first category of works (FlexPlan NSDI’23, Menshen NSDI’22) focuses on extending architecture but is not compatible with RMT programmable switches. The second category of works (HyPer4 CoNEXT’16, Tiara NSDI’22) attempts to virtualize the data plane but introduces huge resource overhead. The third category of works (NetVRM NSDI’22, FlyMon SIGCOMM’22) dynamically allocates resources and operations but lacks generality.

Key idea and contribution

This work proposed P4runpro to support runtime update and dynamic allocation. The key idea of P4runpro is to decouple the implementation of the switch program from hardware resources. This decouple is implemented by identifying and abstracting a set of atomic operations from a variety of heterogeneous switch programs.

To support multiple isolated programs simultaneously, we abstract the execution logic of heterogeneous programs to general programming units, i.e., runtime programming blocks (RPBs). This allows switch programs to become the execution of multiple rounds of atomic operations in RPBs. The same hardware resources are reused by various programs, thus reducing resource consumption.

To overcome the challenge of limited operation capacity, P4runpro introduces primitives and pseudo primitives as runtime programming interfaces.

To dynamically and automatically link the runtime program to the data plane, P4runpro compiler can translate the input programs into entries and consistently update them to the data plane. The key of the compiler is an efficient resource allocation

Evaluation
As for the expressiveness of P4runpro primitives and pseudo primitives, this work compares the LoC between P4runpro program’s complete processing logic and the P4 programs’ control block with equivalent functionality and the comparison results are shown in the Table below. The results are significant because they prove the generality of P4runpro and highlight P4runpro’s ability to express complex P4 logic more simply.

As for the performance of P4runpro, this work conducts tests for the average update delay over 50 repeated updates for each program. The update delay is presented in the Table below where Update Delay represents the time required for data plane updating. The results indicate a positive correlation between update delay and program complexity. Due to the large number of inelastic case blocks in HLL, it experiences a worse update delay. Compared to FlyMon and ActiveRMT, P4runpro exhibits faster update times for most programs. This result is significant because it indicates that P4runpro outperforms FlyMon and ActiveRMT in terms of update performance under heterogeneous programs.

Q1: When you are running multiple applications on the switch, how do you handle resource scheduling?
A1: Our control plane does not resize resources, including stages and memory, dynamically. We allocate resources at compile time and do not change these allocations dynamically. If we want to reallocate resources, we need to recompile the system.

Q2: If we run the original network functions, such as forwarding or other basic functions, what is the impact of running these alongside applications like network caching?
A2: Our system is not ideal for compute-intensive applications. It is more suitable for applications that are lookup-intensive, such as forwarding and load balancing.

Q3: For networking functions that include common functions like packet forwarding and classification, how do you manage shared functionalities across different programs without replication?
A3: This is a limitation of our current system that we aim to address in the future. At present, we cannot support the parallelism of different functions or reuse the same functions for different flows, as these are set initially at compile time.

Personal thoughts
P4runpro is a runtime reconfigurable solution for RMT switches with acceptable overhead. I think one of the shortcomings of P4runpro is that it limits the expressiveness of the p4 language. For example, it does not support dynamic header parsing and network aggregation, which damages the flexibility of the p4 language. As a runtime reconfigurable solution for programmable devices, generality and expressibility must be considered and solved.