Object detection algorithm based on DSGIoU loss and dual branch coordinate attention

MA Sugang; LI Ningbo; HOU Zhiqiang; YU Wangsheng; YANG Xiaobao

doi:10.13700/j.bh.1001-5965.2023.0192

Volume 51 Issue 4

Apr. 2025

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2025 > 51(4): 1085-1095.

MA S G，LI N B，HOU Z Q，et al. Object detection algorithm based on DSGIoU loss and dual branch coordinate attention[J]. Journal of Beijing University of Aeronautics and Astronautics，2025，51（4）：1085-1095 （in Chinese） doi: 10.13700/j.bh.1001-5965.2023.0192

Citation:

PDF( 2329 KB)

Object detection algorithm based on DSGIoU loss and dual branch coordinate attention

doi: 10.13700/j.bh.1001-5965.2023.0192

MA Sugang^{1, 2
,
,},
LI Ningbo¹,
HOU Zhiqiang^{1, 2},
YU Wangsheng³,
YANG Xiaobao¹

1.
School of Computer Science & Technology，Xi’an University of Posts & Telecommunications，Xi’an 710121，China
2.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing，Xi’an University of Posts & Telecommunications，Xi’an 710121，China
3.
School of Information and Navigation，Air Force Engineering University，Xi’an 710077，China

Funds:

National Natural Science Foundation of China (62072370); Natural Science Foundation of Shaanxi Province (2023-JC-YB-598); Science and Technology Project of Xi’an City (22GXFW0125)

More Information

Corresponding author: E-mail：msg@xupt.edu.cn
Received Date: 21 Apr 2023
Accepted Date: 15 May 2023

Available Online: 30 Jun 2023

Publish Date: 19 Jun 2023

Abstract

Abstract

The bounding box regression loss effect is limited, and the multi-scale feature representation ability is insufficient in the YOLOX algorithm, which leads to inaccurate detection results. To address this issue, an object detection algorithm based on distance shape of generalized intersection over union (DSGIoU) loss and dual branch coordinate attention was proposed. Based on the intersection over union (IoU) loss term, the regression convergence effect of the bounding box was optimized by adding three penalty terms: non-overlapping area, distance from the center, and aspect ratio between the true box and the predicted box. Meanwhile, the feature was encoded in two directions by using average pooling and max pooling to obtain directional perception information and position information, so as to enhance the feature. To demonstrate the detection performance of the proposed algorithm, YOLOX with network sizes of Tiny, S, and M was used as the benchmark to carry out tests on PASCAL VOC and KITTI datasets. The experimental results show that the detection accuracy of the proposed algorithm on the PASCAL VOC dataset reaches 80.0%, 82.6%, and 85.8%, respectively, which is 1.5%, 1.6%, and 2.0% higher than the YOLOX as the benchmark. On the KITTI dataset, the detection accuracy reaches 87.7%, 89.7%, and 90.7%, which is increased by 1.7%, 2.9%, and 1.3%, respectively. The proposed algorithm can optimize the network convergence, improve the representation ability of multi-scale features, and significantly boost the detection accuracy.
- object detection,
- loss function,
- bounding box regression,
- coordinate attention,
- YOLOX

FullText(HTML)

References(36)

References

[1]	LIU L, OUYANG W L, WANG X G, et al. Deep learning for generic object detection: A survey[J]. International Journal of Computer Vision, 2020, 128(2): 261-318. doi: 10.1007/s11263-019-01247-4
[2]	DENG L J, GONG Y X, LIN Y, et al. Detecting multi-oriented text with corner-based region proposals[J]. Neurocomputing, 2019, 334: 134-142. doi: 10.1016/j.neucom.2019.01.013
[3]	ZABLOCKI É, BEN-YOUNES H, PÉREZ P, et al. Explainability of deep vision-based autonomous driving systems: review and challenges[J]. International Journal of Computer Vision, 2022, 130(10): 2425-2452. doi: 10.1007/s11263-022-01657-x
[4]	LIU Y H, ZHANG F D, ZHANG Q Y, et al. Cross-view correspondence reasoning based on bipartite graph convolutional network for mammogram mass detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 3811-3821.
[5]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
[6]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6517-6525.
[7]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2023-04-17]. http://arxiv.org/abs/1804.02767.
[8]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2023-04-17]. http://arxiv.org/abs/2004.10934.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[10]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. (2017-01-23)[2023-04-17]. http://arxiv.org/abs/1701.06659.
[11]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 580-587.
[12]	GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1440-1448.
[13]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
[14]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[15]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the 16th European Conference on Computer Vision. Berlin: Springer, 2020: 213-229.
[16]	DAI Z G, CAI B L, LIN Y G, et al. UP-DETR: unsupervised pre-training for object detection with Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1601-1610.
[17]	YU J H, JIANG Y N, WANG Z Y, et al. UnitBox: an advanced object detection network[C]//Proceedings of the 24th ACM International Conference on Multimedia. New York: ACM, 2016: 516-520.
[18]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 658-666.
[19]	ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12993-13000.
[20]	GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368. doi: 10.1007/s41095-022-0271-y
[21]	康涛, 段蓉凯, 杨磊, 等. 融合多注意力机制的卷积神经网络轴承故障诊断方法[J]. 西安交通大学学报, 2022, 56(12): 68-77. KANG T, DUAN R K, YANG L, et al. Bearing fault diagnosis using convolutional neural network based on a multi-attention mechanism[J]. Journal of Xi’an Jiaotong University, 2022, 56(12): 68-77(in Chinese).
[22]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[23]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
[24]	RUAN D S, WANG D Y, ZHENG Y, et al. Gaussian context Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 15124-15133.
[25]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13708-13717.
[26]	GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. (2021-08-06)[2023-04-17]. http://arxiv.org/abs/2107.08430.
[27]	BOUREAU Y L, PONCE J, LECUN Y. A theoretical analysis of feature pooling in visual recognition[C]//Proceedings of the 27th International Conference on Machine Learning. New York: ACM, 2010: 111-118.
[28]	BOUREAU Y L, BACH F, LECUN Y, et al. Learning mid-level features for recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2010: 2559-2566.
[29]	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 379-387.
[30]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9626-9635.
[31]	ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Objects as points[EB/OL]. (2019-04-25)[2023-04-17]. http://arxiv.org/abs/1904.07850.
[32]	王新, 李喆, 张宏立. 一种迭代聚合的高分辨率网络Anchor-free目标检测方法[J]. 北京亚洲成人在线一二三四五六区学报, 2021, 47(12): 2533-2541. WANG X, LI Z, ZHANG H L. High-resolution network Anchor-free object detection method based on iterative aggregation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(12): 2533-2541(in Chinese).
[33]	侯志强, 郭浩, 马素刚, 等. 基于双分支特征融合的无锚框目标检测算法[J]. 电子与信息学报, 2022, 44(6): 2175-2183. HOU Z Q, GUO H, MA S G, et al. Anchor-free object detection algorithm based on double branch feature fusion[J]. Journal of Electronics & Information Technology, 2022, 44(6): 2175-2183(in Chinese).
[34]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. (2022-07-06)[2023-04-17]. http://arxiv.org/abs/2207.02696.
[35]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(2): 318-327.
[36]	鞠默然, 罗江宁, 王仲博, 等. 融合注意力机制的多尺度目标检测算法[J]. 光学学报, 2020, 40(13): 132-140. JU M R, LUO J N, WANG Z B, et al. Multi-scale target detection algorithm based on attention mechanism[J]. Acta Optica Sinica, 2020, 40(13): 132-140(in Chinese).

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(9)

Get Citation

PDF

XML

Article Metrics

Article views(443) PDF downloads(21)

Object detection algorithm based on DSGIoU loss and dual branch coordinate attention

doi: 10.13700/j.bh.1001-5965.2023.0192

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Object detection algorithm based on DSGIoU loss and dual branch coordinate attention

doi: 10.13700/j.bh.1001-5965.2023.0192

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content