STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
A Precise Recognition and Evaluation System for Tennis Forehand Actions Based on a Hybrid Perception Attention Network
DOI: https://doi.org/10.62517/jbdc.202501207
Author(s)
Bocheng He
Affiliation(s)
Faculty of Innovation Engineering, Macao University of Science and Technology, Macao, China
Abstract
This study proposes a recognition and evaluation system for tennis forehand actions based on a Hybrid Perception Attention Network (HPA-Net). To address the limitations of existing action recognition models in handling high-speed and fine-grained tennis actions, we designed an innovative network architecture that integrates spatial and temporal attention mechanisms to achieve precise perception of critical technical aspects of forehand actions. The system incorporates a dynamic scoring method, enabling it to adaptively focus on key areas for improvement for players of different skill levels. Experiments demonstrate that the proposed HPA-Net model achieves a forehand action recognition accuracy of 94.3% and a posture evaluation overlap rate of 91.2%, significantly outperforming existing methods. This system has broad applications in tennis training assistance, match technique analysis, and personal skill improvement. It provides coaches with objective and quantitative teaching references, amateur players with professional-grade technical guidance, and athletes with critical data support for technical analysis during competitions. This study not only introduces a novel algorithmic framework for tennis technique analysis but also lays a methodological foundation for evaluating other fine-grained sports techniques.
Keywords
Tennis Forehand; Hybrid Perception Attention Network; Dynamic Scoring; Transfer Learning; Few-Shot Learning
References
[1] Elliott, B., Reid, M., & Crespo, M. (2009). Technique development in tennis stroke production. ITF Ltd. [2] Reid, M., Elliott, B., & Crespo, M. (2013). Mechanics and learning practices associated with the tennis forehand: A review. Journal of Sports Science & Medicine, 12(2), 225-231. [3] Wang, J., Yan, S., Xiong, Y., & Lin, D. (2022). Sports video analysis: Emerging techniques and applications. ACM Computing Surveys, 55(3), 1-34. [4] Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724-4732. [5] Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. Proceedings of the European Conference on Computer Vision, 20-36. [6] Kovalchik, S., & Reid, M. (2018). A shot taxonomy in the era of tracking data in professional tennis. Journal of Sports Sciences, 36(18), 2096-2104. [7] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. [8] Zhang, L., Wen, G., & Liu, F. (2023). Deep learning for sports action analysis: Challenges and solutions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 5923-5941. [9] Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8. [10] Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. Proceedings of the IEEE International Conference on Computer Vision, 3551-3558. [11] Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 568-576. [12] Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794-7803. [13] Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). SlowFast networks for video recognition. Proceedings of the IEEE International Conference on Computer Vision, 6202-6211. [14] Pirsiavash, H., Vondrick, C., & Torralba, A. (2014). Assessing the quality of actions. Proceedings of the European Conference on Computer Vision, 556-571. [15] Parmar, P., & Morris, B. T. (2019). Learning to score Olympic events. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2449-2458. [16] Zhu, G., Zhang, L., Shen, P., & Song, J. (2017). Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access, 5, 4517-4524. [17] Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in Neural Information Processing Systems, 3630-3638. [18] Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. Proceedings of the 34th International Conference on Machine Learning, 1126-1135. [19] Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L. Y., & Kot, A. C. (2019). NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2684-2701. [20] Zhao, Y., Zhang, Z., Wang, Y., & Zhou, J. (2020). Meta-learning for few-shot action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 31(6), 2061-2074. [21] Faulkner, H., & Dick, A. (2017). TenniSet: A dataset for dense fine-grained event recognition, localisation and description. Proceedings of the British Machine Vision Conference. [22] Zhou, B., Andonian, A., Oliva, A., & Torralba, A. (2018). Temporal relational reasoning in videos. Proceedings of the European Conference on Computer Vision, 803-818. [23] Tran, D., Ray, J., Shou, Z., Chang, S. F., & Paluri, M. (2017). ConvNet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:1708.05038. [24] Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 7444-7452. [25] Zolfaghari, M., Singh, K., & Brox, T. (2018). ECO: Efficient convolutional network for online video understanding. Proceedings of the European Conference on Computer Vision, 695-712.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved