Towards a hybrid NLG system for Data2Text in Portuguese

2015 
In many new interactions with machines, such as dialogue or output using voice, there is the need to convert information internal to a system into sentences, using Data2Text systems. Trying to avoid the limitations of template-based and classical NLG methods, systems based on automatic translation have been proposed in recent years. Despite providing sentences with the important variability needed for a better interaction, this doesn't come without a cost. Contrary to template-based, these systems produce sentences with heterogeneous quality. In this paper we proposed to combine a translation based NLG system with a classifier module capable of providing information on the Intelligibility or Quality of the sentences. Sentences marked as unacceptable are replaced by template-based generated ones. This classifier module is the main focus of the paper and combines extraction of linguistic features with a classifier trained in a manually annotated corpus. Results suggest that our approach is valid as best results obtained have false positives below 8% and this metric can be even lower in practical applications, decreasing to around 3%, as the generation module produces low quality sentences at a rate lower than 30%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    5
    Citations
    NaN
    KQI
    []