On Modeling Context from Objects with a Long Short-Term Memory for Indoor Scene Recognition

2019 
Recognizing indoor scenes is still regarded an open challenge on the Computer Vision field. Indoor scenes can be well represented by their composing objects, which can vary in angle, appearance, besides often being partially occluded. Even though Convolutional Neural Networks are remarkable for image-related problems, the top performances on indoor scenes are from approaches modeling the intricate relationship of objects. Knowing that Recurrent Neural Networks were designed to model structure from a given sequence, we propose representing an image as a sequence of object-level information in order to feed a bidirectional Long Short-Term Memory network trained for scene classification. We perform a Many-to-Many training approach, such that each element outputs a scene prediction, allowing us to use each prediction to boost recognition. Our method outperforms RNN-based approaches on MIT67, an entirely indoor dataset, while also improved over the most successful methods through an ensemble of classifiers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    3
    Citations
    NaN
    KQI
    []