Analysis and Correction of Web Documents’ Non-Compliance with Web Standards

2019 
Based on the justification for equal accessibility of the World Wide Web (Web for short), we analyzed the non-compliance of collected web documents with web standards through a statistical physics approach. The web documents were examined by using a validator that classified the noncompliance into errors and warnings of different types. We found that the frequency distributions of errors and warnings in a web document followed a power-law distribution and that a strong correlation existed between the numbers of errors and warnings. In addition, some errors or warnings were identified much more frequently than others, which could be modeled by a geometric distribution. By utilizing these properties, we proposed a scheme to correct non-compliance that focused on the most frequently occurring errors and warnings. We empirically tested the proposed method against the collected web documents and showed that the proposed method effectively corrected about 47% and 85% of errors and warnings, respectively. We also used network theory to analyze correlations within and between different errors and warnings in correction results and found that some types of errors and/or warnings affected each other in the correction. In this paper, correction results of the proposed method are compared with those of Tidy, and different characteristics between the two correction methods are discussed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    1
    Citations
    NaN
    KQI
    []