Web Extraction Method Based on DOM Tree and Domain Ontology

Guo Jianbing,Cui Zhi-ming,Chen Ming,Zhao Pengpeng

Web Extraction Method Based on DOM Tree and Domain Ontology

2012

Guo Jianbing
Cui Zhi-ming
Chen Ming
Zhao Pengpeng

To solve the problem of automatic extraction from different DeepWeb result page structures,this paper proposes a method which combines the Web structure and the content of Web pages.This method uses the characteristics of data content and the DOM tree nodes which are marked by the domain ontology library positioning data area.An improved simple tree matching algorithm is used to identify data records.Experimental results show that the F-measure value of this method is 2.93%~6.67% higher than that of traditional methods.

Keywords:

Blossom algorithm
Ontology
Document Object Model
Data mining
Computer science
web extraction
data content

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations