SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts
2021
Automated Compliance Checking (ACC) systems aim to semantically parse
building regulations to a set of rules. However, semantic parsing is known to
be hard and requires large amounts of training data. The complexity of creating
such training data has led to research that focuses on small sub-tasks, such as
shallow parsing or the extraction of a limited subset of rules. This study
introduces a shallow parsing task for which training data is relatively cheap
to create, with the aim of learning a lexicon for ACC. We annotate a small
domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger
that achieves 79,93 F1-score on the test set. We then show through manual
evaluation that the model identifies most (89,84%) defined terms in a set of
building regulation documents, and that both contiguous and discontiguous
Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
22
References
0
Citations
NaN
KQI