Multi-resolution 3D CNN for learning multi-scale spatial features in CAD models

2021 
Abstract Learning multi-scale spatial features from 3D spatial geometric representations of objects such as point clouds, 3D CAD models, surfaces, and RGB-D data can potentially improve object recognition accuracy. Current deep learning approaches learn such features using structured data representations such as volume occupancy grids (voxels) and octrees or unstructured representations such as graphs and point clouds. Structured representations are generally restricted by their inherent limitations on the resolution, such as the voxel grid dimensions or the maximum octree depth. At the same time, it is challenging to learn directly from unstructured representations of 3D data due to non-uniformity among the samples. A hierarchical approach that maintains the structure at a larger scale while still accounting for the details at a smaller scale in specific spatial locations can provide an optimal solution for learning from 3D data. In this paper, we propose a multi-level learning approach to capture large-scale features at a coarse level (for example, using a coarse voxelization) while simultaneously capturing flexible sparse information of the small-scale features at a fine level (for example, a local fine-level voxel grid) at different spatial locations. To demonstrate the utility of the proposed multi-resolution learning, we use a multi-level voxel representation of CAD models to perform object recognition. The multi-level voxel representation consists of a coarse voxel grid containing volumetric information of the 3D objects and multiple fine-level voxel grids corresponding to each voxel in the coarse grid containing a portion of the object boundary. In addition, we develop an interpretability-based feedback approach to transfer saliency information from one level of features to another in our hierarchical end-to-end learning framework. Finally, we demonstrate the performance of our multi-resolution learning algorithm for object recognition. We outperform several previously published benchmarks for object recognition while using significantly less memory during training.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    62
    References
    0
    Citations
    NaN
    KQI
    []