Learning Common and Transferable Feature Representations for Multi-Modal Data

2020 
LiDAR sensors are crucial in automotive perception for accurate object detection. However, LiDAR data is hard to interpret for humans and consequently time-consuming to label. Whereas camera data is easy interpretable and thus, comparably simpler to label. Within this work we present a transductive transfer learning approach to transfer the knowledge for the object detection task from images to point cloud data. We propose a multi-modal adversarial Auto Encoder architecture which disentangles uni-modal features into two groups: common (transferable) features, and complementary (modality-specific) features. This disentanglement is based on the hypothesis that a set of common features exist. An important point of our framework is that the disentanglement is learned in an unsupervised manner. Furthermore, the results show that only a small amount of multi-modal data is needed to learn the disentanglement, and thus to transfer the knowledge between modalities. As a result we our experiments show that training with 75% less data of the KITTI objects, the classification accuracy achieved is of 71.75%, only 3.12% less than when using the full data set. The implications of these findings can have great impact in perception pipelines based on LIDAR data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []