This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

Title : This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

Authors : Raffaele Sommese (University of Twente), Roland van Rijswijk-Deij (University of Twente), Mattijs Jonker (University of Twente)

Scribe: Rulan Yang (Xiamen University)

Introduction

(It’s an online presentation)

The web extends beyond generic TLDs (gTLDs), and country-code top-level domains (ccTLDs) represent a significant portion of the local online landscape. This lack of transparency leads to an underrepresentation of ccTLDs in research, limiting our understanding of the global and regional Web ecosystems.

Key idea and contribution

They analyzed public data sources and found that they cover 43%-80% of names in 19 ccTLDs, increasing coverage over time. They showed that combining CT logs with Common Crawl data improves coverage by up to 11 percentage points. Their analysis using port scan data confirmed that public domain lists cover a significant portion of domains with an active web presence. They also found that 60% of newly registered domains appear in CT logs within a day and 80% within five days. Their coverage results for ccTLDs were comparable to those for gTLDs.

Evaluation

Their ground truth data is derived from the OpenINTEL project from 2018 to 2023, which measures 19 ccTLDs. They find that public sources can provide substantial coverage across different TLDs.

Q1 : Is it easy to get data from the owners of TLD?

A1: While technical teams are generally open to data sharing, the real challenge lies in navigating legal procedures and obtaining necessary agreements. This bureaucratic process is often more complicated than the technical setup itself, especially in academic settings.

Personal thoughts

This paper reminds me to think of the pros and cons of exposuring ccTLD to the public The decision to expose ccTLD data publicly should involve carefully assessing these pros and cons. While transparency and research benefits are significant, measures should be in place to mitigate privacy risks, prevent misuse, and address operational challenges.