Title: The Next Generation of BGP Data Collection Platforms
Authors: Thomas Alfroy, Thomas Holterbach (University of Strasbourg); Thomas Krenc, kc Claffy (UC San Diego / CAIDA); Cristel Pelsser (UCLouvain)
Scribe: Ruyi Yao
Introduction
The study of the global Internet infrastructure relies on BGP data collection platforms that maintain BGP peering sessions with network operators who volunteer to share (sometimes portions of) their routing tables. Vantage Points (VPs) are set to collect routes from BGP router.
The authors emphasize that more VPs are needed to improve the accuracy and coverage of scientific and operational analyses of Internet infrastructure, e.g., the AS topology mapping, locating outages, and BGP hijack detection. 25-100× more VPs are expected.
Data collection presents challenges for both data providers and users. For data providers, the compound effect—more VPs and more updates per VP—yields a quadratic increase in updates reaching the collection platforms. For users, in order to save time and resources, they are willing to sacrifice the quality of the results to facilitate data processing, using only a sample of the data.
Key idea and contribution
- The authors used simulations and experiments to corroborate that important analyses lose accuracy and/or coverage when using heavily sampled topologies.
- The authors found strong data redundancy at different granularities and proposed GILL. GILL has two key mechanisms: an overshoot-and-discard collection scheme and sampling algorithms that maximize fairness.
Their contributions include:
- a survey,measurements,and simulations to demonstrate the limitations of current systems;
- a general framework and algorithms to assess and remove redundancy in BGP observations;
- quantitative analysis of the benefit of our approach in terms of accuracy and coverage for several canonical BGP routing analyses such as hijack detection and topology mapping.
- Implementation and deployment a new BGP peering collection system that automates peering expansion using our redundancy analytics,which provides a path forward for more thorough evaluation of this approach.
Evaluation
Long-term impact: Simulations of a scenario where 50% (vs. 2%) of ASes peered with GILL tripled the number of peer-to-peer links observed,doubled the number of Internet failures that we could localize,and reduced by 33% the proportion of undetected forged-origin hijacks without processing more data than what RIS and RV do today.
Immediate benefits: GILL improved the accuracy and coverage while processing the same data volume inferred more AS relationships (+16%), identified and corrected errors in CAIDA’s ASrank dataset, and inferred more forged-origin hijacks (+23%) with ≈4× fewer incorrect inferences (i.e., false positives).