Estimating Human Pose Efficiently by Parallel Pyramid Networks

2021 
Good performance and high efficiency are both critical for estimating human pose in practice. Recent state-of-the-art methods have greatly boosted the pose detection accuracy through deep convolutional neural networks, however, the strong performance is typically achieved without high efficiency. In this paper, we design a novel network architecture for human pose estimation, which aims to strike a fine balance between speed and accuracy. Two essential tasks for successful pose estimation, preserving spatial location and extracting semantic information, are handled separately in the proposed architecture. Semantic knowledge of joint type is obtained through deep and wide sub-networks with low-resolution input, and high-resolution features indicating joint location are processed by shallow and narrow sub-networks. Because accurate semantic analysis mainly asks for adequate depth and width of the network and precise spatial information mostly requests preserving high-resolution features, good results can be produced by fusing the outputs of the sub-networks. Moreover, the computational cost can be considerably reduced comparing with existing networks, since the main part of the proposed network only deals with low-resolution features. We refer to the architecture as “parallel pyramid” network (PPNet), as features of different resolutions are processed at different levels of the hierarchical model. The superiority of our network is empirically demonstrated on two benchmark datasets: the MPII Human Pose dataset and the COCO keypoint detection dataset. PPNet outcompetes all recent methods by using less computation and memory to achieve better human pose estimation results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    62
    References
    1
    Citations
    NaN
    KQI
    []