Text Mining for Factor Modeling of Japanese Stock Performance

2021 
Recent advances in natural language processing (NLP) have made a significant breakthrough in various NLP tasks. Word embeddings and pre-trained language models can convert unstructured textual data to computable numerical vectors, containing meanings and contextual information. However, most public language models are only for English, and therefore, financial applications are also in English. Few Japanese studies a direct approach, which predicts not attribute of a text but real-world information. This study aims to build a factor model based on Corporate Annual Securities Reports' textual data for the Japanese stock market and investigate its effectiveness. We found that textual data complement financial numerical data and improve the p-value of the Gibbons-Ross-Shanken Test, reducing the factor model's anomaly. Furthermore, BERT [1] is much better than word2vec [2] and LDA [3] for this factor modeling.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []