Attribute Extracting from Wikipedia Pages in Domain Automatically

2017 
In the age of Big Data, input determines output. There is a large amount of data on the internet, but little knowledge. So researchers develop different kinds of methods to automatically extract knowledge from different data platforms. The traditional methods of supervised learning cost more time and labor, which are willing to be gradually replaced by the semi-supervised and unsupervised learning methods. In this paper we proposed a new semi-supervised method to complete this task, which costs just little, called TSVM (Transductive Support Vector Machine). In order to improve the accuracy and the intelligent level, we also add the Word Embeddings to the semi-supervised method. The AP (Affinity Propagation) algorithm makes a contribution to the word clustering automatically. Experimental results demonstrate a better performance to extract the attribute information in the military transportation domain from the Wikipedia compared with the traditional supervised leaning method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    1
    Citations
    NaN
    KQI
    []