Abstract:Aiming at the problem of poor performance that the current transformer-based image classification model is directly applied to the small data sets,this paper proposes a transformer adaptive feature vector fusion network,which fuses features at different stages in the feature extractor,reduces the loss of feature information and obtains more information under different receptive fields,and uses maximum pooling to remove redundant information of features,so that the extracted features are more discriminative.In addition,in order to make full use of the feature information at all levels of the image for classification prediction,this paper fuses the feature vectors generated at each stage of the network to make the fused feature vectors more representative.Thereby reducing the network's dependence on large data sets,so that the network can also obtain good performance in small data sets.Experiments show that the algorithm proposed in this paper achieves 74.22%,85.86% and 81.4% of the TOP-1 accuracy on the datasets Mini-ImageNet-100,CIFAR-100 and ImageNet-1k,respectively.Without increasing the amount of computation, the baselines are improved by 6.0%, 3.0%,and 0.1%,respectively, and the amount of parameters is reduced by 18.3%.The code of this article is open source at "https://github.com/xhutong xue/afvf".