Extracting Addresses from Unstructured Text Using Bi-directional Recurrent Neural Networks

2018 
Addresses can be classified as unstructured text because they lack meta-information to be directly indexed in databases. Still they demonstrate an internal structure which can used to automatically extract them using machine learning techniques. In this work we describe a machine learning approach to identify addresses in unstructured text (like blogs) using Bidirectional Recurrent Neural Networks (BRNNs). We overcome the problem of lack of training data by generating synthetic free text entries and come up with problem specific features. Our system does not impose any strict condition on the structure or style of addresses leading to many applications in real life.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []