Paragraph Similarity Scoring and Fine-Tuned BERT for Legal Information Retrieval and Entailment

2020 
The assessment of the relevance of legal documents and the application of legal rules embodied in legal documents are some of the key processes in the field of law. In this paper, we present our approach to the 2020 Competition on Legal Information Extraction/Entailment (COLIEE-2020), which provides researchers with the opportunity to find ways of accomplishing these complex tasks using computers. Here, we describe the methods used to build the models for the four tasks that are part of the competition and the results of their application. For Task 1, concerning the prediction of whether a base case cites a candidate case, we devise a method for evaluating the similarity between cases based on individual paragraph similarity. This method can be used to reduce the number of candidate cases by 85%, while maintaining over 80% of the cited cases. We then train a Support Vector Machines model to make the final prediction. The model is the best solution submitted for Task 1. We use a similar method for Task 2. For Task 3, we use an approach based on BM25 measure in combination with the identification of similar previously asked questions. For Task 4, we use a transformer model fine-tuned on existing entailment data sets as well as on the provided domain-specific statutory law data set.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    5
    Citations
    NaN
    KQI
    []