基于transformer自适应特征向量融合的图像分类
DOI:
CSTR:
作者:
作者单位:

(昆明理工大学 信息工程与自动化学院,云南 昆明 650500)

作者简介:

李 凡 (1986-),男,工学博士,副教授,硕士生导师,主要从事计算机视觉、图像处理等方面的研究.

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61862036,61962030,81860318)资助项目


Image classification based on transformer adaptive feature vector fusion
Author:
Affiliation:

(Faculty of Information Engineering and Automation, Kunming University of Science and Technology,Kunming, Yunnan 650500, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对目前基于transformer的图像分类模型直接应用在小数据集上性能较差的问题,本文提出了transformer自适应特征向量融合网络,该网络在特征提取器中将不同阶段的特征进行融合,减少特征信息丢失的同时获得更多不同感受野下的信息,同时利用最大池化来去除特征中的冗余信息,从而使提取的特征更具有判别性。此外,为了充分利用图像的各级特征信息来进行分类预测,本文将网络各阶段产生的特征向量进行融合,使融合后的特征向量更具有表征能力,从而减少网络对大数据集的依赖,使网络在小数据集中也能获得很好的性能。实验表明,本文提出的 算法在数据集Mini-ImageNet-100、CIFAR-100和ImageNet-1k上的TOP-1准确率分别达到了74.22%、85.86%和81.4%。在没有增加计算量的情况下,在baseline上分别提高了6.0%、3.0%和0.1%,且参数量减少了18.3%。本文代码开源在“https://github.com/xhutongxue/afvf”。

    Abstract:

    Aiming at the problem of poor performance that the current transformer-based image classification model is directly applied to the small data sets,this paper proposes a transformer adaptive feature vector fusion network,which fuses features at different stages in the feature extractor,reduces the loss of feature information and obtains more information under different receptive fields,and uses maximum pooling to remove redundant information of features,so that the extracted features are more discriminative.In addition,in order to make full use of the feature information at all levels of the image for classification prediction,this paper fuses the feature vectors generated at each stage of the network to make the fused feature vectors more representative.Thereby reducing the network's dependence on large data sets,so that the network can also obtain good performance in small data sets.Experiments show that the algorithm proposed in this paper achieves 74.22%,85.86% and 81.4% of the TOP-1 accuracy on the datasets Mini-ImageNet-100,CIFAR-100 and ImageNet-1k,respectively.Without increasing the amount of computation, the baselines are improved by 6.0%, 3.0%,and 0.1%,respectively, and the amount of parameters is reduced by 18.3%.The code of this article is open source at "https://github.com/xhutong xue/afvf".

    参考文献
    相似文献
    引证文献
引用本文

胡义,黄勃淳,李凡.基于transformer自适应特征向量融合的图像分类[J].光电子激光,2023,34(6):602~609

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-04-27
  • 最后修改日期:2022-06-26
  • 录用日期:
  • 在线发布日期: 2023-06-14
  • 出版日期:
文章二维码