| Citation: | YANG J,ZHANG J Y. U-shaped semantic segmentation network of high-resolution remote sensing images embedded with self-attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(5):1514-1527 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0269 |
To reduce the difficulty of extracting small object features from high-resolution remote sensing images, a dual-encoder feature fusion network model based on convolution structure and self-attention mechanism was proposed, which was suitable for semantic segmentation of high-resolution remote sensing images. Firstly, a dual-encoder structure was designed to extract global and local detail information of remote sensing images and improve the segmentation accuracy of small object features. Secondly, a feature aggregation module was used to aggregate feature information at different stages, so as to embed more global contextual information. Finally, an edge thinning loss module was used to improve the recognition ability of the model for object edge information. The $m_{F_1} $ average value of the segmentation results on the ISPRS Vaihingen and Potsdam datasets achieved 91.28% and 93.16%, respectively. Compared with the current mainstream algorithms, the segmentation accuracy of small objects like cars and the overall segmentation accuracy were improved. The proposed model solves the problem of inaccurate segmentation of small objects and edge information in the semantic segmentation of high-resolution remote sensing images to a certain extent.
| [1] |
龚健雅. 人工智能时代测绘遥感技术的发展机遇与挑战[J]. 武汉大学学报(信息科学版), 2018, 43(12): 1788-1796.
GONG J Y. Chances and challenges for development of surveying and remote sensing in the age of artificial intelligence[J]. Geomatics and Information Science of Wuhan University, 2018, 43(12): 1788-1796(in Chinese).
|
| [2] |
ZHAO J Q, ZHOU Y, SHI B Y, et al. Multi-stage fusion and multi-source attention network for multi-modal remote sensing image segmentation[J]. ACM Transactions on Intelligent Systems and Technology, 2021, 12(6): 1-20.
|
| [3] |
DING L, ZHANG J, BRUZZONE L. Semantic segmentation of large-size VHR remote sensing images using a two-stage multiscale training architecture[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(8): 5367-5376. doi: 10.1109/TGRS.2020.2964675
|
| [4] |
SHEIKH R, MILIOTO A, LOTTES P, et al. Gradient and log-based active learning for semantic segmentation of crop and weed for agricultural robots[C]//Proceedings of the IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2020: 1350-1356.
|
| [5] |
李道纪, 郭海涛, 卢俊, 等. 遥感影像地物分类多注意力融和U型网络法[J]. 测绘学报, 2020, 49(8): 1051-1064. doi: 10.11947/j.AGCS.2020.20190407
LI D J, GUO H T, LU J, et al. A remote sensing image classification procedure based on multilevel attention fusion U-Net[J]. Acta Geodaetica et Cartographica Sinica, 2020, 49(8): 1051-1064(in Chinese). doi: 10.11947/j.AGCS.2020.20190407
|
| [6] |
MARMANIS D, SCHINDLER K, WEGNER J D, et al. Classification with an edge: improving semantic image segmentation with boundary detection[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 135: 158-172. doi: 10.1016/j.isprsjprs.2017.11.009
|
| [7] |
LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
|
| [8] |
RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
|
| [9] |
LI S F, LIAO C, DING Y L, et al. Cascaded residual attention enhanced road extraction from remote sensing images[J]. ISPRS International Journal of Geo-Information, 2022, 11(1): 9.
|
| [10] |
吴强强, 王帅, 王彪, 等. 空间信息感知语义分割模型的高分辨率遥感影像道路提取[J]. 遥感学报, 2022, 26(9): 1872-1885. doi: 10.11834/jrs.20210021
WU Q Q, WANG S, WANG B, et al. Road extraction method of high-resolution remote sensing image on the basis of the spatial information perception semantic segmentation model[J]. National Remote Sensing Bulletin, 2022, 26(9): 1872-1885(in Chinese). doi: 10.11834/jrs.20210021
|
| [11] |
LI X, XU F, LYU X, et al. Dual attention deep fusion semantic segmentation networks of large-scale satellite remote-sensing images[J]. International Journal of Remote Sensing, 2021, 42(9): 3583-3610. doi: 10.1080/01431161.2021.1876272
|
| [12] |
DING L, TANG H, BRUZZONE L. LANet: local attention embedding to improve the semantic segmentation of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(1): 426-435. doi: 10.1109/TGRS.2020.2994150
|
| [13] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems. New York: ACM, 2017: 5998-6008.
|
| [14] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-02)[2023-05-10]. http://arxiv.org/abs/2010.11929.
|
| [15] |
SUN Z Y, ZHOU W P, DING C, et al. Multi-resolution transformer network for building and road segmentation of remote sensing image[J]. ISPRS International Journal of Geo-Information, 2022, 11(3): 165. doi: 10.3390/ijgi11030165
|
| [16] |
LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted window[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 9992-10002.
|
| [17] |
LADICKÝ L, RUSSELL C, KOHLI P, et al. Associative hierarchical CRFs for object class image segmentation[C]//Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway: IEEE Press, 2009: 739-746.
|
| [18] |
ARNAB A, JAYASUMANA S, ZHENG S, et al. Higher order conditional random fields in deep neural networks[C]//Proceedings of the European Conference on Computer Vision. Beilin: Springer, 2016: 524-540.
|
| [19] |
CHEN B, QIU F, WU B F, et al. Image segmentation based on constrained spectral variance difference and edge penalty[J]. Remote Sensing, 2015, 7(5): 5980-6004. doi: 10.3390/rs70505980
|
| [20] |
DIAKOGIANNIS F I, WALDNER F, CACCETTA P, et al. ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 162: 94-114. doi: 10.1016/j.isprsjprs.2020.01.013
|
| [21] |
LIU R, MI L, CHEN Z Z. AFNet: adaptive fusion network for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(9): 7871-7886. doi: 10.1109/TGRS.2020.3034123
|
| [22] |
PAN X, ZHAO J, XU J. Conditional generative adversarial network-based training sample set improvement model for the semantic segmentation of high-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(9): 7854-7870. doi: 10.1109/TGRS.2020.3033816
|
| [23] |
MA A L, WANG J J, ZHONG Y F, et al. FactSeg: foreground activation-driven small object semantic segmentation in large-scale remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 5606216.
|
| [24] |
杨军, 于茜子. 结合空洞卷积的FuseNet变体网络高分辨率遥感影像语义分割[J]. 武汉大学学报(信息科学版), 2022, 47(7): 1071-1080.
YANG J, YU X Z. Semantic segmentation of high-resolution remote sensing images based on improved FuseNet combined with atrous convolution[J]. Geomatics and Information Science of Wuhan University, 2022, 47(7): 1071-1080(in Chinese).
|
| [25] |
CHEN X, LI Z Q, JIANG J, et al. Adaptive effective receptive field convolution for semantic segmentation of VHR remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(4): 3532-3546. doi: 10.1109/TGRS.2020.3009143
|
| [26] |
LIU S, DING W R, LIU C H, et al. ERN: edge loss reinforced semantic segmentation network for remote sensing images[J]. Remote Sensing, 2018, 10(9): 1339. doi: 10.3390/rs10091339
|
| [27] |
XU Z Y, ZHANG W C, ZHANG T X, et al. HRCNet: high-resolution context extraction network for semantic segmentation of remote sensing images[J]. Remote Sensing, 2021, 13(1): 71.
|
| [28] |
CAO Y, XU J R, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE Press, 2019: 1971-1980.
|
| [29] |
ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[C]//Proceedings of the 9th International Conference on Learning Representations. Virtual: ICLR, 2021.
|
| [30] |
LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944.
|
| [31] |
WANG L B, LI R, DUAN C X, et al. A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6506105.
|
| [32] |
WANG L B, LI R, WANG D Z, et al. Transformer meets convolution: a bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote Sensing, 2021, 13(16): 3065. doi: 10.3390/rs13163065
|
| [33] |
GAO L, LIU H, YANG M H, et al. STransFuse: fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 10990-11003. doi: 10.1109/JSTARS.2021.3119654
|
| [34] |
HE X, ZHOU Y, ZHAO J Q, et al. Swin transformer embedding UNet for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 4408715.
|
| [35] |
KONECNY G. The international society for photogrammetry and remote sensing (ISPRS) study on the status of mapping in the world[C]//Proceedings of the In International Workshop on Global Geospatial Information. Piscataway: IEEE Press, 2013: 4-24.
|
| [36] |
LIU Y, MINH NGUYEN D, DELIGIANNIS N, et al. Hourglass-ShapeNetwork based semantic segmentation for high resolution aerial imagery[J]. Remote Sensing, 2017, 9(6): 522. doi: 10.3390/rs9060522
|
| [37] |
谢文亮, 朱丹, 佟新鑫. 一种基于视觉注意的小目标检测方法[J]. 计算机工程与应用, 2013, 49(12): 125-128. doi: 10.3778/j.issn.1002-8331.1110-0357
XIE W L, ZHU D, TONG X X. Small target detection method based on visual attention[J]. Computer Engineering and Applications, 2013, 49(12): 125-128(in Chinese). doi: 10.3778/j.issn.1002-8331.1110-0357
|
| [38] |
PAN X R, GAO L R, MARINONI A, et al. Semantic labeling of high resolution aerial imagery and LiDAR data with fine segmentation network[J]. Remote Sensing, 2018, 10(5): 743. doi: 10.3390/rs10050743
|
| [39] |
LI X, XU F, XIA R L, et al. Hybridizing cross-level contextual and attentive representations for remote sensing imagery semantic segmentation[J]. Remote Sens, 2021, 13: 2986. doi: 10.3390/rs13152986
|
| [40] |
NOGUEIRA K, DALLA MURA M, CHANUSSOT J, et al. Dynamic multicontext segmentation of remote sensing images based on convolutional networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(10): 7503-7520. doi: 10.1109/TGRS.2019.2913861
|
| [41] |
MOU L C, HUA Y S, ZHU X X. Relation matters: relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(11): 7557-7569. doi: 10.1109/TGRS.2020.2979552
|
| [42] |
ZHANG X Y, LI L H, DI D L, et al. SERNet: squeeze and excitation residual network for semantic segmentation of high-resolution remote sensing images[J]. Remote Sensing, 2022, 14(19): 4770. doi: 10.3390/rs14194770
|