Towards identifying networks with internet clients using public data

2021 
Does an outage impact any users? Can a geolocation database known to be good at locating users and bad at infrastructure be trusted for a particular prefix? Is a content-heavy network likely to peer with a particular network? For these questions and many more, knowing which prefixes contain Internet users aids in interpreting Internet analysis. However, existing datasets of Internet activity are out of date, unvalidated, based on privileged data, or too coarse. As a step towards identifying which IP prefixes contain users, we present multiple novel techniques to identify which IP prefixes host web clients without relying on privileged data. Our techniques identify client activity in ASes responsible for 98.8% of Microsoft CDN traffic and in prefixes responsible for 95.2% of Microsoft CDN traffic. Less than 1% of prefixes identified by our technique as active do not contact Microsoft at all. We present measurements of Internet usage worldwide and sketch future directions for extending the techniques to measure relative activity levels across prefixes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []