Abstract:Sensor fusion in autonomous driving is an important part of the perception syste m,and the fusion of radar point cloud information and visual information can improve vehic le perception. However,existing studies projecting radar points onto images simply add height to the radar points, which does not provide more accurate lateral information and lacks spatial infor mation.Simultaneous fusion of the two modalities is only simple,which produces a joint representati on but is not sufficient to fully capture the complex connection between the two modalities.In this pape r,we simultaneously increase the width of the radar point cloud for spatial information enhancement, and additionally design a method for cross-modal interaction fusion of the two modalities using differential feature attention fusion.In this paper,the model is evaluated on the challenging nuSce nes dataset,and the proposed model achieves 46.3% and 33.9% in NDS score and mAP,respectively,reflecting excellent performance.