EP3: Understanding the Host Network (SIGCOMM'24), Nov. 20, 2024

Paper: Understanding the Host Network (Best Student Paper, SIGCOMM’24)
Authors: Midhul Vuppalapati, Saksham Agarwal, Henry N. Schuh, Baris Kasikci, Arvind Krishnamurthy, Rachit Agarwal

Presenter: Rulan Yang, Xiamen University, China
Guest of Honor : Midhul Vuppalapati

After the paper sharing, the participants had a lively discussion. The following is a partial Q&A record:

Q1: Around 1980, people would consider designing a computer with many interconnected CPUs, because the internal network of a computer is essentially designed for high bandwidth. So are we going back to the design of that time and making a big computer to make the problem go away?
A1: Midhul believes that host networks still need to be studied, and there will still be problems. Because for a supercomputer with ultra-high bandwidth, there will still be many heterogeneous components interconnected, such as CPUs and memories. Many problems are not always caused by bandwidth. Different heterogeneous resources and different transmission mechanisms can also cause problems. Therefore, host networks still need to be studied.

Q2: What tools do you use to measure what is happening inside the host network? For example, what tools do you use to measure latency?
A2: This is actually one of the most difficult parts. Midhul and his team used Intel’s latest processors in their experiments. These processors provide very low-level tools for monitoring system performance. They use the low-level registers of these processors to record what happens in the system. The effective performance of this hardware record is close to the nanosecond level. In software, a reasonable cycle can be set to read them. By analyzing these records, what is happening in the system can be inferred.

Q3: Is the concept of host network domain defined automatically or manually? Will there be different analysis results under different definitions?
A3: In fact, these host network domains are related to the processor architecture. Different data flows can be classified into several traffic types with different read and write properties. In the traffic of the same nature, the definitions of their host network domains are similar. For example, for the C2M type domain, the CPU transfers the data to the LFB, and then the LFB transmits the data all the way to the DRAM through the CHA and MC. After the CPU transfers the data to the LFB, the CPU can do other things, but the LFB needs to wait for the DRAM response. Therefore, the C2M type domain includes LFB, CHA, MC and DRAM. There will be different domain definitions for different processor architectures, which need to be discussed together with the specific architecture used.

For more information about the discussion session, please watch our meeting recording, which has been uploaded to YouTube. Thank you for your support!

1 Like

May I as what is the address of the meeting recording on YouTube?