基于ConvGRU和注意力特征融合的人体动作识别
DOI:
CSTR:
作者:
作者单位:

(贵州大学 大数据与信息工程学院,贵州 贵阳 550025)

作者简介:

张荣芬 (1977-),女,博士,教授,硕士生导师,主要从事机器视觉、智能算法和智能硬件等方面的研究.

通讯作者:

中图分类号:

基金项目:

贵州省科学技术基金(黔科合基础-ZK[2021]重点 001)资助项目


Human motion recognition based on ConvGRU and attention feature fusion
Author:
Affiliation:

(College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在动作识别任务中,如何充分学习和利用视频的空间特征和时序特征的相关性,对最终识别结果尤为重要。针对传统动作识别方法忽略时空特征相关性及细小特征,导致识别精度下降的问题,本文提出了一种基于卷积门控循环单元(convolutional GRU, ConvGRU)和注意力特征融合(attentional feature fusion,AFF) 的人体动作识别方法。首先,使用Xception网络获取视频帧的空间特征提取网络,并引入时空激励(spatial-temporal excitation,STE) 模块和通道激励(channel excitation,CE) 模块,获取空间特征的同时加强时序动作的建模能力。此外,将传统的长短时记忆网络(long short term memory, LSTM)网络替换为ConvGRU网络,在提取时序特征的同时,利用卷积进一步挖掘视频帧的空间特征。最后,对输出分类器进行改进,引入基于改进的多尺度通道注意力的特征融合(MCAM-AFF)模块,加强对细小特征的识别能力,提升模型的准确率。实验结果表明:在UCF101数据集和HMDB51数据集上分别达到了95.66%和69.82%的识别准确率。该算法获取了更加完整的时空特征,与当前主流模型相比更具优越性。

    Abstract:

    In the action recognition task,how to fully learn and utilize the correlation between the spatial features and temporal features of the video is particularly important for the final recognition results.Aiming at the problem that the traditional action recognition method ignores the correlation of spatio-temporal features and small features,which leads to the decrease of recognition accuracy,this paper proposes a human action recognition method based on convolutional GRU (ConvGRU) and attentional feature fusion (AFF).Firstly,the Xception network is used to obtain the spatial feature extraction network of video frames,and the spatio-temporal excitation (STE) module and channel excitation (CE) module are introduced to obtain the spatial features and strengthen the modeling ability of temporal actions.In addition,the traditional long short term memory (LSTM) network is replaced by the ConvGRU network,which uses convolution to further mine the spatial features of video frames while extracting temporal features.Finally,the output classifier is improved,and the feature fusion module based on improved multi-scale channel attention is introduced to strengthen the recognition ability of small features and improve the accuracy of the model.The experimental results show that the recognition accuracy of 95.66 % and 69.82 % are achieved on the UCF101 dataset and the HMDB51 dataset,respectively.The algorithm obtains more complete spatio-temporal features and is superior to the current mainstream models.

    参考文献
    相似文献
    引证文献
引用本文

程娜娜,张荣芬,刘宇红,刘源,刘昕斐,杨双.基于ConvGRU和注意力特征融合的人体动作识别[J].光电子激光,2023,34(12):1298~1306

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-03-21
  • 最后修改日期:2023-06-15
  • 录用日期:
  • 在线发布日期: 2024-01-03
  • 出版日期:
文章二维码