1. Gayathri T, Mamatha H. How to Improve Video Analytics with Action
Recognition: A Survey. ACM Computing Surveys. 2023;57(1).
2. Armeni I, Sener O, Zamir AR, Jiang H, Brilakis I, Fischer M, et al. 3D Semantic
Parsing of Large-Scale Indoor Spaces. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. 2016; p. 1534–1543.
3. Li Y, Yu AW, Meng T, Caine B, Ngiam J, Peng D, et al. Deepfusion:
LiDAR-Camera Depth Fusion for Multi-Modal 3D Object Detection. Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
4. Zhu C, Jia Q, Chen W, Guo Y, Liu Y. Deep Learning for Video-Text Retrieval:
A Review. International Journal of Multimedia Information Retrieval.
5. Yang S, Liu J, Lu S, Hwa EM, Hu Y, Kot AC. Self-supervised 3D action
representation learning with skeleton cloud colorization. IEEE Transactions on
Pattern Analysis and Machine Intelligence. 2023;46(1):509–524.
6. Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for
skeleton-based action recognition. Proceedings of the AAAI conference on
artificial intelligence. 2018;32(1).
7. Yang G, Yang Y, Lu Z, Yang J, Liu D, Zhou C, et al. STA-TSN: spatial-temporal
attention temporal segment network for action recognition in video. PloS one.
8. Huang KH, Huang YB, Lin YX, Hua KL, Tanveer M, Lu X, et al. GRA: Graph
Representation Alignment for Semi-Supervised Action Recognition. IEEE
Transactions on Neural Networks and Learning Systems. 2024;35(9):11896–11905.
9. Dai C, Wei Y, Xu Z, Chen M, Liu Y, Fan J. ConMLP: MLP-based
self-supervised contrastive learning for skeleton data analysis and action
10. Li D, Tang Y, Zhang Z, Zhang W. Cross-stream contrastive learning for
self-supervised skeleton-based action recognition. Image and Vision Computing.
11. Yang H, Zhang Q, Ren Z, Yuan H, Zhang F. Contrastive Learning with
Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action
Recognition. HUMAN-CENTRIC COMPUTING AND INFORMATION
12. Wu W, Hua Y, Zheng C, Wu S, Chen C, Lu A. Skeletonmae: Spatial-temporal
masked autoencoders for self-supervised skeleton action recognition. 2023 IEEE
international conference on multimedia and expo workshops (ICMEW). 2023; p.
13. Hu J, Hou Y, Guo Z, Gao J. Global and local contrastive learning for
self-supervised skeleton-based action recognition. IEEE Transactions on Circuits
and Systems for Video Technology. 2024;34:10578–10589.
14. Zhu Y, Han H, Yu Z, Liu G. Modeling the relative visual tempo for
self-supervised skeleton-based action recognition. Proceedings of the IEEE/CVF
International Conference on Computer Vision. 2023; p. 13913–13922.
15. Cheng K, Zhang Y, He X, et al. Skeleton-based action recognition with shift
graph convolutional network. Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition. 2020; p. 183–192.
16. Tian H, Ma X, Li X, et al. Skeleton-based action recognition with
select-assemble-normalize graph convolutional networks. IEEE Transactions on
17. Jang S, Lee H, Kim W J, et al. Multi-scale structural graph convolutional
network for skeleton-based action recognition. IEEE Transactions on Circuits and
Systems for Video Technology, 2024;34(8):7244-7258.
18. Zhang Y, Yang Y, Gao X. Lightweight Graph Convolutional Network For
Efficient Skeleton Based Action Recognition. 2024 International Joint Conference
on Neural Networks (IJCNN). 2024; p. 1–8.
19. Ren Z, Luo L, Qin Y. Skeleton-guided and supervised learning of hybrid network
for multi-modal action recognition. Journal of Visual Communication and Image
20. Xu J, Zhu A, Lin J, Ke Q, Chen C. Skeleton-OOD: An End-to-End
Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection.
NEUROCOMPUTING. 2025;619.
21. Aouaidjia K, Zhang C, Pitas I. Spatio-temporal invariant descriptors for
skeleton-based human action recognition. Information Sciences. 2025;700:121832.
22. Wu S, Lu G, Han Z, Chen L. A robust two-stage framework for human skeleton
action recognition with GAIN and masked autoencoder. Neurocomputing.
23. Huang H, Xu L, Zheng Y, Yan X. MAFormer: A cross-channel spatio-temporal
feature aggregation method for human action recognition. AI Communications.
24. Zhao Z, Liu Y, Ma L. Compositional action recognition with multi-view feature
fusion. Plos one. 2022;17(4):e0266259.
25. Yang H, Wang S, Jiang L, Su Y, Zhang Y. Hierarchical adaptive multi-scale
hypergraph attention convolution network for skeleton-based action recognition.
Applied Soft Computing. 2025;172:112855.
26. Zhu S, Sun L, Ma Z, Li C, He D. Prompt-supervised dynamic attention graph
convolutional network for skeleton-based action recognition. Neurocomputing.
27. Xu Z, Xu J. Spatiotemporal decoupling attention transformer for 3D
skeleton-based driver action recognition. Complex & Intelligent Systems.
28. Huang B, Wang S, Hu C, Li X. Semi-supervised human action recognition via
dual-stream cross-fusion and class-aware memory bank. Engineering Applications
of Artificial Intelligence. 2024;136:108937.
29. Lin L, Zhang J, Liu J. Actionlet-dependent contrastive learning for unsupervised
skeleton-based action recognition. Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2023; p. 2363–2372.
30. Liu X, Gao B. Reconstruction-driven contrastive learning for unsupervised
skeleton-based human action recognition. The Journal of Supercomputing.
31. He Z, Lv J, Fang S. Representation modeling learning with multi-domain
decoupling for unsupervised skeleton-based action recognition. Neurocomputing.
32. Liu Z, Lu B, Wu Y, Gao C. Multi-view daily action recognition based on Hooke
balanced matrix and broad learning system. Image and Vision Computing.
33. Lin L, Wu L, Zhang J, Liu J. Idempotent Unsupervised Representation Learning
for Skeleton-Based Action Recognition. European Conference on Computer
34. Jin Z, Wang Y, Wang Q, Shen Y, Meng H. SSRL: Self-supervised
spatial-temporal representation learning for 3D action recognition. IEEE
Transactions on Circuits and Systems for Video Technology. 2023;34(1):274–285.
35. Yao S, Ping Y, Yue X, Chen H. Graph Convolutional Networks for multi-modal
robotic martial arts leg pose recognition. Frontiers in Neurorobotics.
36. Wu C, Wu XJ, Kittler J, Xu T, Ahmed S, Awais M, et al. Scd-net:
Spatiotemporal clues disentanglement network for self-supervised skeleton-based
action recognition. Proceedings of the AAAI conference on artificial intelligence.
37. Moutik O, Sekkat H, Ait Tchakoucht T, El Kari B, Alaoui AEH. A puzzle
questions form training for self-supervised skeleton-based action recognition.
Image and Vision Computing. 2024;148:105137.
38. Guan S, Yu X, Huang W, Fang G, Lu H. DMMG: dual min-max games for
self-supervised skeleton-based action recognition. IEEE Transactions on Image
39. Guo T, Liu M, Liu H, Wang G, Li W. Improving self-supervised action
recognition from extremely augmented skeleton sequences. Pattern Recognition.
40. Wang M, Li X, Chen S, Zhang X, Ma L, Zhang Y. Learning representations by
contrastive spatio-temporal clustering for skeleton-based action recognition. IEEE
Transactions on Multimedia. 2023;26:3207–3220.
41. Wu Y, Xu Z, Yuan M, Tang T, Meng R, Wang Z. Multi-scale motion contrastive
learning for self-supervised skeleton-based action recognition. Multimedia
42. Liu R, Liu Y, Wu M, Xin W, Miao Q, Liu X, et al. SG-CLR: Semantic
representation-guided contrastive learning for self-supervised skeleton-based
action recognition. Pattern Recognition. 2025; p. 111377.
43. Shahroudy A, Liu J, Ng TT, Wang G. Ntu rgb+ d: A large scale dataset for 3d
human activity analysis. Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016; p. 1010–1019.
44. Liu J, Shahroudy A, Perez M, Wang G, Duan LY, Kot AC. Ntu rgb+ d 120: A
large-scale benchmark for 3d human activity understanding. IEEE transactions
on pattern analysis and machine intelligence. 2019;42(10):2684–2701.
45. Liu J, Song S, Liu C, et al. A benchmark dataset and comparison study for
multi-modal human action analytics. ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM), 2020;16(2):1–24.
46. Guo T, Liu H, Chen Z, Liu M, Wang T, Ding R. Contrastive learning from
extremely augmented skeleton sequences for self-supervised action recognition.
Proceedings of the AAAI conference on artificial intelligence. 2022;36(1):762–770.
47. Kim B, Chang HJ, Kim J, Choi JY. Global-local motion transformer for
unsupervised skeleton-based action learning. European conference on computer
48. Zhang H, Hou Y, Zhang W, Li W. Contrastive positive mining for unsupervised
3d action representation learning. European Conference on Computer Vision.
49. Mao Y, Zhou W, Lu Z, Deng J, Li H. Cmd: Self-supervised 3d action
representation learning with cross-modal mutual distillation. European
Conference on Computer Vision. 2022; p. 734–752.
50. Dong J, Sun S, Liu Z, Chen S, Liu B, Wang X. Hierarchical contrast for
unsupervised skeleton-based action representation learning. Proceedings of the
AAAI Conference on Artificial Intelligence. 2023;37(1):525–533.
51. Chen YX, Zhao L, Yuan JB, Tian Y, Xia ZY, Geng SJ, Han LG, Metaxas DN.
Hierarchically self-supervised transformer for human skeleton representation
learning. European Conference on Computer Vision. 2022; p. 185–202.
52. Zheng N, Wen J, Liu R, Long L, Dai J, Gong Z. Unsupervised representation
learning with long-term dynamics for skeleton based action recognition.
Proceedings of the AAAI conference on artificial intelligence. 2018;32(1).
53. Lin L, Song S, Yang W, Liu J. Ms2l: Multi-task self-supervised learning for
skeleton based action recognition. Proceedings of the 28th ACM international
conference on multimedia. 2020; p. 2490–2498.
54. Thoker F M, Doughty H, Snoek C G M. Skeleton-contrastive 3D action
representation learning. Proceedings of the 29th ACM international conference on