language-icon Old Web
English
Sign In

A Holistic View on Web Archives

2021 
In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    59
    References
    0
    Citations
    NaN
    KQI
    []