递归门控增强与金字塔预测的铁路全景分割

陈永; 周方春; 张娇娇

doi:10.13700/j.bh.1001-5965.2023.0492

递归门控增强与金字塔预测的铁路全景分割

doi: 10.13700/j.bh.1001-5965.2023.0492

陈永^{1, 2, ,},
周方春¹,
张娇娇¹

1.
兰州交通大学电子与信息工程学院，兰州 730070
2.
甘肃省人工智能与图形图像处理工程研究中心，兰州 730070

基金项目:

国家自然科学基金(62462043,61963023)；兰州交通大学重点研发项目(ZDYF2304)

详细信息

通讯作者:
E-mail：edukeylab@126.com

中图分类号: TP391.4
计量
- 文章访问数: 392
- HTML全文浏览量: 89
- PDF下载量: 14
- 被引次数: 0
出版历程
- 收稿日期: 2023-07-27
- 录用日期: 2023-09-26
- 网络出版日期: 2023-10-07
- 整期出版日期: 2025-07-14

Railway panoramic segmentation based on recursive gating enhancement and pyramid prediction

1.
School of Electronic and Information Engineering，Lanzhou Jiaotong University，Lanzhou 730070，China
2.
Gansu Provincial Engineering Research Center for Artificial Intelligence and Graphics & Image Processing，Lanzhou 730070，China

Funds:

National Natural Science Foundation of China (62462043,61963023); Key Research and Development Project of Lanzhou Jiaotong University (ZDYF2304)

More Information

Corresponding author: E-mail：edukeylab@126.com

摘要

摘要:
针对高速铁路场景全景分割时存在目标特征提取不充分、边缘轮廓分割模糊等问题，提出了一种递归门控增强与金字塔预测的铁路全景分割网络。在DETR模型的基础上，构建改进多尺度级联CSP-DarkNet53特征提取网络，提升对不同尺度的铁路场景目标特征提取能力；提出递归门控与类特征增强模块，获取更丰富的边缘特征信息，增强对边缘轮廓信息的提取和分割的能力；将多尺度可变形注意力引入编码骨干网络中，进一步捕获多尺度上下文信息，减少分割细节特征丢失；通过改进金字塔预测与像素类别分割模块，实现铁路全景的分割输出。实验结果表明：相比于原始DETR模型，所提方法的全景分割质量指标PQ提升了7.4%，前景实例目标评价指标PQ^Th提升了9.7%，背景填充区域质量评价指标PQ^St提升了6.6%。所提方法在铁路场景下图像全景分割具有较好的性能，主观评价均优于对比方法。
- 全景分割 /
- DETR /
- 递归门控增强 /
- 金字塔预测 /
- 高速铁路
Abstract:
A recursive gated enhancement and pyramid prediction railway panoramic segmentation network is proposed to address the issues of insufficient target feature extraction and blurred edge contour segmentation in high-speed railway scene panoramic segmentation. On the basis of the DETR panoramic segmentation model, firstly, an improved multi-scale cascaded CSP-DarkNet53 feature network is constructed to enhance the ability to extract target features from railway scenes of different scales. Then, to improve the ability to extract and segment edge contour information and acquire richer edge feature information, a recursive gating and class feature augmentation module is proposed. Then, deformable attention is introduced into the coding backbone network to further capture context information and reduce the loss of segmentation details. Finally, by improving the pyramid prediction and pixel category segmentation module, the segmentation output of the railway panoramic view is achieved. The suggested approach enhances the original DETR model’s panorama segmentation quality index PQ by 7.4%, the foreground instance target’s panoramic quality index PQ^Th by 9.7%, and the background-filled area’s panoramic quality index PQ^St by 6.6%, according to experimental data. The proposed method has good performance in panoramic image segmentation in railway scenes, and its subjective evaluation is superior to the comparison method.
- panoramic segmentation /
- DETR /
- recursive gating enhancement /
- pyramid prediction /
- high-speed railway

HTML全文

图 1 本文方法整体模型框架

Figure 1. Overall modelling framework of the proposed method

下载: 全尺寸图片幻灯片

图 2 多尺度级联CSP-DarkNet53特征提取网络结构

Figure 2. Structure of multi-scale cascaded CSP-DarkNet53 feature extract network

下载: 全尺寸图片幻灯片

图 3 各级特征图可视化比较

Figure 3. Visual comparison of feature maps at different levels

下载: 全尺寸图片幻灯片

图 4 递归门控与类特征增强模块

Figure 4. Recursive gating and class feature enhancement module

下载: 全尺寸图片幻灯片

图 5 类别关注度特征热力图可视化

Figure 5. Visualization of heat maps based on class attention feature

下载: 全尺寸图片幻灯片

图 6 类特征增强特征可视化

Figure 6. Visualization of class feature enhancement

下载: 全尺寸图片幻灯片

图 7 多尺度可变形注意力机制示意图

Figure 7. Schematic of multi-scale deformable attention mechanism

下载: 全尺寸图片幻灯片

图 8 混合域注意力机制结构

Figure 8. Architecture of mixed domain attention mechanism

下载: 全尺寸图片幻灯片

图 9 不同铁路场景下全景分割实验结果

Figure 9. Panoramic segmentation experiment results in different railway scenes

下载: 全尺寸图片幻灯片

图 10 铁路场景下全景分割局部放大结果

Figure 10. Partial enlarged diagram for panoramic segmentation in railway scenes

下载: 全尺寸图片幻灯片

图 11 铁路山体滑坡全景分割实验结果

Figure 11. Experiment results on panoramic segmentation of railway landslides

下载: 全尺寸图片幻灯片

图 12 铁路山体滑坡全景分割局部放大结果

Figure 12. Partial enlarged map for panoramic segmentation of railway landslides

下载: 全尺寸图片幻灯片

表 1 不同全景分割方法性能对比

Table 1. Performance comparison of different panoramic segmentation methods %

方法	PQ	PQ^Th	PQ^St
文献[6]	48.4	52.6	41.0
DETR	47.2	51.7	40.3
本文	54.6	61.4	46.9

下载: 导出CSV

表 2 模型消融实验比较

Table 2. Comparison of ablation experimental of model

基准	多尺度级联 CSP-DarkNet53 特征提取网络	递归门控与类特征增强模块	多尺度可变形注意力编码器	改进金字塔预测网络	PQ/%
√					47.2
√	√				48.5
√	√	√			51.1
√	√	√	√		52.5
√	√	√	√	√	54.6

下载: 导出CSV

表 3 模型计算量对比实验结果

Table 3. Comparative experiment results of model calculation

方法	计算量
文献[6]	157 GFLOPs
DETR	93 GFLOPs
本文	223 GFLOPs
注：GFLOPs为模型计算量的标准单位，表示10亿次浮点运算。

下载: 导出CSV

参考文献(25)

[1]	陈永, 王镇, 周方春. 空间定位与特征泛化增强的铁路异物跟踪检测[J]. 北京亚洲成人在线一二三四五六区学报, 2025, 51(1): 9-18. CHEN Y, WANG Z, ZHOU F C. Infrared railway foreign objects tracking based on spatial location and feature generalization enhancement[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(1): 9-18(in Chinese).
[2]	LI C, XIE Z Y, QIN Y, et al. A multi-scale image and dynamic candidate region-based automatic detection of foreign targets intruding the railway perimeter[J]. Measurement, 2021, 185: 109853. doi: 10.1016/j.measurement.2021.109853
[3]	路通, 余祖俊, 郭保青, 等. 网状多尺度与双向通道注意力的铁路场景语义分割[J]. 交通运输系统工程与信息, 2023, 23(2): 233-241. LU T, YU Z J, GUO B Q, et al. Semantic segmentation of railway scene based on reticulated multi-scale and bidirectional channel attention[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(2): 233-241(in Chinese).
[4]	KIRILLOV A, HE K, GIRSHICK R, et al. Panoptic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 9396-9405.
[5]	KIRILLOV A, GIRSHICK R, HE K M, et al. Panoptic feature pyramid networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 6392-6401.
[6]	ZHANG W, PANG J, CHEN K, et al. K-Net: towards unified image segmentation[EB/OL]. (2021-10-01)[2023-07-01]. http://arxiv.org/abs/2106.14855?context=cs.AI.
[7]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]//Proceedings of the 16th European Conference on Computer Vision. Berlin: Springer, 2020: 213-229.
[8]	CHENG B, SCHWING A G, KIRILLOV A. Per-pixel classifica tion is not all you need for semantic segmentation[EB/OL]. (2021-10-31)[2023-07-01]. http://arxiv.org/abs/2107.06278?context=cs.
[9]	WANG H Y, ZHU Y K, ADAM H, et al. MaX-DeepLab: end-to-end panoptic segmentation with mask Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 5459-5470.
[10]	CHENG B W, COLLINS M D, ZHU Y K, et al. Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 12472-12482.
[11]	CHENG B W, MISRA I, SCHWING A G, et al. Masked-attention mask Transformer for universal image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 1280-1289.
[12]	YANG T J, COLLINS M D, ZHU Y, et al. DeeperLab: single-shot image parser[C]//Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1902-5093.
[13]	毛琳, 任凤至, 杨大伟, 等. 实例特征深度链式学习全景分割网络[J]. 光学精密工程, 2020, 28(12): 2665. doi: 10.37188/OPE.20202812.2665 MAO L, REN F Z, YANG D W, et al. INFNet: deep instance feature chain learning network for panoptic segmentation[J]. Optics and Precision Engineering, 2020, 28(12): 2665(in Chinese). doi: 10.37188/OPE.20202812.2665
[14]	宁欣, 田伟娟, 于丽娜, 等. 面向小目标和遮挡目标检测的脑启发CIRA-DETR全推理方法[J]. 计算机学报, 2022, 45(10): 2080-2092. doi: 10.11897/SP.J.1016.2022.02080 NING X, TIAN W J, YU L N, et al. Brain-inspired CIRA-DETR full inference model for small and occluded object detection[J]. Chinese Journal of Computers, 2022, 45(10): 2080-2092(in Chinese). doi: 10.11897/SP.J.1016.2022.02080
[15]	CHEN Y T, CHEN S F. Localizing plucking points of tea leaves using deep convolutional neural networks[J]. Computers and Electronics in Agriculture, 2020, 171: 105298. doi: 10.1016/j.compag.2020.105298
[16]	ZHAO Z Q, ZHENG P, XU S T, et al. Object detection with deep learning: a review[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11): 3212-3232. doi: 10.1109/TNNLS.2018.2876865
[17]	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2261-2269.
[18]	WANG Z M, WANG J S, YANG K, et al. Semantic segmentation of high-resolution remote sensing images based on a class feature attention mechanism fused with Deeplabv3+[J]. Computers and Geosciences, 2022, 158: 104969. doi: 10.1016/j.cageo.2021.104969
[19]	RAO Y M, ZHAO W L, TANG Y S, et al. HorNet: efficient high-order spatial interactions with recursive gated convolutions[EB/OL]. (2022-10-11)[2023-07-01]. http://arxiv.org/abs/2207.14284v3.
[20]	冷冰, 冷敏, 常智敏, 等. 基于Transformer结构的深度学习模型用于外周血白细胞检测[J]. 仪器仪表学报, 2023, 44(5): 113-120. LENG B, LENG M, CHANG Z M, et al. Deep learning model based on Transformer architecture for peripheral blood leukocyte detection[J]. Chinese Journal of Scientific Instrument, 2023, 44(5): 113-120(in Chinese).
[21]	ZHU X, SU W, LU L, et al. Deformable DETR: deformable Transformers for end-to-end object detection[EB/OL]. (2021-05-18)[2023-07-01]. http://arxiv.org/abs/2010.04159?context=cs.
[22]	李飞, 胡坤, 张勇, 等. 基于混合域注意力YOLOv4的输送带纵向撕裂多维度检测[J]. 浙江大学学报(工学版), 2022, 56(11): 2156-2167. LI F, HU K, ZHANG Y, et al. Multi-dimensional detection of longitudinal tearing of conveyor belt based on YOLOv4 of hybrid domain attention[J]. Journal of Zhejiang University (Engineering Science), 2022, 56(11): 2156-2167(in Chinese).
[23]	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11531-11539.
[24]	YING X Y, WANG Y Q, WANG L G, et al. A stereo attention module for stereo image super-resolution[J]. IEEE Signal Processing Letters, 2020, 27: 496-500. doi: 10.1109/LSP.2020.2973813
[25]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 658-666.