Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network

2020 
Video question generation is a challenging task in visual information retrieval, which generates questions given a sequence of video frames. The existing methods mainly tackle the problem of single-turn video question generation, but single-turn conversation usually can’t meet the needs of video information acquisition. In this paper, we propose a new framework for single-turn VQG, which introduces attention mechanism to process inference of dialog history. And we introduce selection mechanism to choose from the candidate questions generated by each round of dialog history. In the framework, we leverage a recent video question answering model to predict the answer to the generated question and adopt the answer quality as rewards to fine-tune our model based on a reinforced learning mechanism. We also introduce a new task of multi-turn video question generation (M-VQG), which is generating multiple questions based on dialog history and video information to build conversation step by step. Our method achieves the state-of-the-art performance of the single-turn VQG task on two large-scale datasets, YouTube-Clips and TACoS-MultiLevel, and provides a baseline approach for M-VQG task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    0
    Citations
    NaN
    KQI
    []