3DMeT: 3D Medical Image Transformer for Knee Cartilage Defect Assessment.

2021 
While convolutional neural networks (CNNs) are dominating the area of computer-aided 3D medical image diagnosis, they are incapable of capturing global information due to the intrinsic locality of convolution. Transformers, another type of neural network empowered with self-attention mechanism, are good at representing global relations, yet computationally expensive and do not generalize well on small datasets. Applying Transformers on 3D medical images has two major problems: 1) medical 3D volumes are bigger in size than natural images which makes training process computationally impractical, 2) and 3D medical image datasets are usually smaller than natural image datasets since medical images are expensive to collect. In this paper, we propose the 3D Medical image Transformer (3DMeT) to address these two issues. 3DMeT introduces 3D convolutional layers to perform block embedding instead of the original linear embedding to cut the computational cost. Additionally, we propose a teacher-student training strategy to address the data-hungry issue by adapting convolutional layers’ weights from a CNN teacher. We conduct experiments on knee images, results demonstrate that the 3DMeT (70.2) confidently outperforms the 3DCNNs (65.3) and Vision Transformer (58.7).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    1
    Citations
    NaN
    KQI
    []