内容提要: |
The task of human motion recognition entails the automatic identification of human behaviors from images or video sequences. Human motion recognition is particularly important in computer vision and pattern recognition due to its wide applicability in human-computer interaction, video surveillance, robot technology, game control and motion video analysis. Earlier works recognized actions from RGB data, which involves complex illumination conditions and cluttered backgrounds, and it has limitations in the practical applications. Skeleton data is a higher-level motion feature that includes the position of the human joint. It is robust to scale and illumination changes, and can be invariant to camera view, human body rotation and motion speed. Action recognition based on skeleton data has received widespread attention. Convolutional neural networks have achieved excellent performance in many computer vision tasks, especially in image classification. Convolutional neural networks is an important research direction in the field of action recognition. However, how to extract the temporal and spatial information from the skeleton sequence effectively and encode the skeleton sequence into an image are the keys to recognize actions. In this paper, a method based on skeleton data is proposed. The keleton sequences are encoded into Temporal Pyramid Skeleton Motion Map, and fed into the convolutional neural networks for feature extraction and classification. Firstly, each frame of the human skeleton sequence is projected onto three orthogonal planes to generate a Skeleton Sequence Distribution Map, which is used to obtain spatial information. Secondly, a temporal pyramid method is used to segment the Skeleton Sequence Distribution Map to generate a Segmented Skeleton Sequence Distribution Map, and then the motion energy are stacked through the Segmented Skeleton Sequence Distribution Map to generate the Segmented Skeleton Motion Map for obtaining the temporal information of the motion. Thirdly, Segmented Skeleton Motion Map is processed with pseudo-color coding to obtain more color texture information. Fourthly, the addition update strategy is used to generate Temporal Pyramid Skeleton Motion Map. Finally, the feature of Temporal Pyramid Skeleton Motion Map can be extracted and classified by using convolutional neural network. The proposed method is evaluated on three action recognition datasets. It obtains 83.74% recognition accuracy on the SYSU-12 dataset, 97.675% recognition accuracy on the MSR-12 dataset, and 92.56% recognition accuracy on the UTD-MHAD dataset. Compared with the existing action recognition algorithm, it has achieved good experimental results. |