内容提要: |
Recent methods based on 3D skeleton data have achieved outstanding perfor- mance due to its conciseness, robustness, and view-independent representation. With the development of deep learning, Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTM)-based learning methods have achieved promising performance for action recognition. However, for CNN-based meth- ods, it is inevitable to loss temporal information when a sequence is encoded into images. In order to capture as much spatial-temporal information as pos- sible, LSTM and CNN are adopted to conduct effective recognition with later score fusion. In addition, depth based joint trajectory map (DTM) capturing more information is adopted to feed into CNN model. Our method achieved state-of-the-art results on UTD-MHAD datesets and NTU RGB+D datasets for 3D human action analysis. |