EP19: Edge Caching as Differentiation, Oct. 8, 2025

letian_zhu · October 15, 2025, 5:15am

Paper: Edge Caching as Differentiation
Authors: Muhammad Abdullah, Mughees Ur Rehman, Pavlos Nikolopoulos, Katerina Argyraki
Presenter: Yiningjiang, Xiamen University
Guest of Honor: Muhammad Abdullah, Swiss Federal Institute of Technology in Lausanne

letian_zhu · October 15, 2025, 7:10am

Q: What motivated you to study edge caching as your research topic?
A: My motivation started during my master’s when I was studying web browsing experience on Android and other mobile systems. I became interested in how user experience is affected by infrastructure and network design. When I started my PhD, I wanted to explore how network infrastructures influence end-user experience—especially since nowadays most online activities involve video consumption via CDNs. I wanted to know whether CDNs themselves could be introducing disparities in content delivery quality.

Q: Your paper shows that even two content providers using the same CDN can have very different cache hit rates. Have you explored whether better cache replacement strategies could reduce this disparity?
A: That’s exactly what we are investigating now. Currently, most industrial CDNs still rely on simple mechanisms like LRU and TTL-based eviction. We’re exploring whether more intelligent caching policies—perhaps AI-assisted—could reduce the quality gap between popular and less popular content without adding significant complexity.

Q: You used vantage points mostly in the US and Europe, but one was in Indonesia. Did you notice any distinctive observations there?
A: Yes. The overall patterns were similar, but in Indonesia, we noticed many requests were routed to other countries, such as those in Europe, due to limited CDN infrastructure locally. This cross-border routing occasionally increased miss latency and affected QoE compared to vantage points in the US and Europe.

Q: How would results differ if the same experiment were done in Africa or South America, where connectivity may be slower?
A: In such regions, QoE would overall be lower because of limited bandwidth and higher transmission delays. However, the relative disparity between services would remain similar, since what primarily drives QoE differences is miss latency—how far the origin or edge cache is from the client—not absolute bandwidth.

Q: If one provider keeps a 40% hit rate while another enjoys nearly 100%, wouldn’t users gradually shift to the better one, leading to a market monopoly?
A: That’s an insightful point. Unfortunately, cache hit rate isn’t directly under a provider’s control—they can’t simply pay a CDN for a higher hit rate. Instead, providers must innovate elsewhere, like using better compression or improving UX to attract users and grow popularity. As they gain popularity, their hit rate naturally improves.

Q: Could a provider manipulate cache hit rate by artificially generating many GET requests to keep their content cached?
A: Technically, yes—it’s possible to send repeated GET requests to push content into edge caches. However, CDNs have security mechanisms to detect such behavior, and ethically it’s not acceptable. So while it’s possible in theory, it’s not a sustainable or legitimate strategy.

Q: You mentioned waiting 24 hours between measurements. How did you decide this threshold?
A: Initially, I noticed that if I repeated measurements within a few hours, hit rates jumped to near 100% since my own requests had filled the caches. After examining TTLs, I found most cached content lived between 12–24 hours. So I chose a 24-hour interval to ensure previous runs didn’t bias new results.

Q: Does that mean only the first few users of a video experience low QoE?
A: Exactly. Once a few users access a video, it’s cached, and subsequent viewers enjoy better quality. However, if the video never reaches enough views, it stays uncached and remains unpopular. This creates a feedback loop—unpopular content stays unpopular because it never gets cached fast enough to attract new users.

Q: What was the most challenging part of this research?
A: The hardest part was building the measurement infrastructure—especially web crawlers. Every streaming service is different; you can’t use one crawler for all. I had to design and maintain many scripts, and since web pages change frequently, I often had to rewrite them. This was time-consuming and occasionally frustrating.

Q: Which streaming platforms changed their interfaces most frequently?
A: The popular ones—Netflix, Prime Video, Hulu, etc.—tend to update their interfaces often, which breaks crawlers. Less popular services usually keep their pages stable, making them easier to handle.

Q: Did you run experiments in China or with Chinese CDNs like Alibaba Cloud?
A: Not in this work. Our vantage points were AWS EC2 instances, and at that time, EC2 wasn’t available in mainland China. Also, Chinese streaming services typically use domestic CDNs like Alibaba Cloud instead of AWS or Cloudflare. It would indeed be interesting to replicate our study using Alibaba’s CDN to see if they employ more advanced caching strategies.

Q: For new students who want to start research in measurement studies, what’s your advice?
A: Spend most of your time building a robust and reproducible measurement infrastructure. Small mistakes can ruin months of data collection. Start small—test with one VM and one tool before scaling up globally. Once your setup is solid, the analysis is the easy part. Robustness and reproducibility are everything in measurement research.