Athlete 3D pose estimation from a monocular TV sports video using pre-trained temporal convolutional networks

2020 
Our goal is to estimate athlete 3D pose from monocular TV sports video with a lower cost of collecting training data. To achieve this goal, we utilize a pre-trained deep neural network as a 3D pose estimator for estimating human 3D pose from 2D joint locations of the person in each image. Each image in popular datasets used for training such 3D pose estimator is obtained from a camera whose axis is parallel to the ground. On the other hand, since an image in TV sports video is generally taken from a bird’s eye view, joint locations of a human is distorted in the lower part of the image. Therefore, it is not appropriate to give 2D joint locations of the person directly to the pre-trained 3D pose estimator. To resolve this problem, we propose to correct 2D joint locations in an image of TV sports video by a homography transformation that maps the points in the image of TV sports video to the corresponding points in the image taken by the camera that captures training data for the 3D pose estimator. Experimental results show that the proposed method can estimate athlete 3D pose from monocular TV sports video.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []