Volume 51 Issue 7
Jul.  2025
Turn off MathJax
Article Contents
CHEN Y D,ZHAO Y B,WU E H. Robust semi-supervised video object segmentation based on dynamic embedding feature[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2253-2261 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0354
Citation: CHEN Y D,ZHAO Y B,WU E H. Robust semi-supervised video object segmentation based on dynamic embedding feature[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2253-2261 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0354

Robust semi-supervised video object segmentation based on dynamic embedding feature

doi: 10.13700/j.bh.1001-5965.2023.0354
Funds:

National Natural Science Foundation of China (62473201,62477026); Preliminary Research Project on Leading Technologies by Wuxi Industrial Innovation Research Institute

More Information
  • Corresponding author: E-mail:adamchen@nuist.edu.cn
  • Received Date: 13 Jun 2023
  • Accepted Date: 01 Dec 2023
  • Available Online: 12 Jan 2024
  • Publish Date: 10 Jan 2024
  • A semi-supervised video object segmentation (VOS) method was proposed to address the issues of increasing memory consumption during inference and the difficulty of training relying solely on low-level pixel features. The method is based on dynamic embedding features and an auxiliary loss function. First, a dynamic embedding feature was employed to establish a constant-sized memory bank. Through spatiotemporal aggregation, historical information was utilized to generate and update dynamic embedding features. Simultaneously, a memory update sensor was employed to adaptively control the update interval of the memory bank, accommodating different motion patterns in various videos. Second, an auxiliary loss function was utilized to provide the network with guidance at the high semantic feature level, enhancing model accuracy and training efficiency by offering diverse guidance across multiple feature levels. Finally, to address the issue of misalignment between similar objects in the foreground and background of videos, a spatial constraint module was designed, which leveraged the temporal continuity of videos to better capture the correlation between the mask from the previous frame and the current frame. Experimental results demonstrate that the proposed method achieves an accuracy of 84.5% J&F on the DAVIS 2017 validation set and 82.4% J&F on the YouTube-VOS 2019 validation set.

     

  • loading
  • [1]
    PERAZZI F, KHOREVA A, BENENSON R, et al. Learning video object segmentation from static images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 3491-3500.
    [2]
    VOIGTLAENDER P, LEIBE B. Online adaptation of convolutional neural networks for video object segmentation[EB/OL]. (2017-08-01)[2023-06-01]. http://arxiv.org/abs/1706.09364v2.
    [3]
    CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5320-5329.
    [4]
    YANG Z X, WEI Y C, YANG Y. Collaborative video object segmentation by foreground-background integration[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 332-348.
    [5]
    YANG Z X, WEI Y C, YANG Y. Collaborative video object segmentation by multi-scale foreground-background integration[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 4701-4712.
    [6]
    ZHANG P, HU L, ZHANG B, et al. Spatial consistent memory network for semi-supervised video object segmentation[C]//Proceedings of the DAVIS Challenge on Video Object Segmentation. Piscataway: IEEE Press, 2020: 1-4.
    [7]
    SEONG H, HYUN J, KIM E. Kernelized memory network for video object segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 629-645.
    [8]
    SEONG H, OH S W, LEE J Y, et al. Hierarchical memory matching network for video object segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 12869-12878.
    [9]
    LI M X, HU L, XIONG Z W, et al. Recurrent dynamic embedding for video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 1322-1331.
    [10]
    OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9226-9235.
    [11]
    CHENG H K, TAI Y W, TANG C K. Rethinking space-time networks with improved memory coverage for efficient video object segmentation[EB/OL]. (2021-10-08)[2023-06-01]. http://arxiv.org/abs/2106.05210?context=cs.CV.
    [12]
    CHENG H K, TAI Y W, TANG C K. Modular interactive video object segmentation: interaction-to-mask, propagation and difference-aware fusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 5555-5564.
    [13]
    XIE H Z, YAO H X, ZHOU S C, et al. Efficient regional memory network for video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1286-1295.
    [14]
    季传俊, 陈亚当, 车洵. 融合视觉词与自注意力机制的视频目标分割[J]. 中国图象图形学报, 2022, 27(8): 2444-2457.

    JI C J, CHEN Y D, CHE X. Visual words and self-attention mechanism fusion based video object segmentation method[J]. Journal of Image and Graphics, 2022, 27(8): 2444-2457(in Chinese).
    [15]
    征煜, 陈亚当, 郝川艳. 特征一致性约束的视频目标分割[J]. 中国图象图形学报, 2020, 25(8): 1558-1566. doi: 10.11834/jig.190571

    ZHENG Y, CHEN Y D, HAO C Y. Video object segmentation algorithm based on consistent features[J]. Journal of Image and Graphics, 2020, 25(8): 1558-1566(in Chinese). doi: 10.11834/jig.190571
    [16]
    LI Y, SHEN Z R, SHAN Y. Fast video object segmentation using the global context module[C]//Proceedings of the European Conferenceon Computer Vision. Berlin: Springer, 2020: 735-750.
    [17]
    WANG H C, JIANG X L, REN H B, et al. SwiftNet: real-time video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1296-1305.
    [18]
    LIANG Y Q, LI X, JAFARI N, et al. Video object segmentation with adaptive feature bank and uncertain-region refinement[EB/OL]. (2020-10-15)[2023-06-01]. http://arxiv.org/abs/2010.07958v1.
    [19]
    CHEN Y D, HAO C Y, YANG Z X, et al. Fast target-aware learning for few-shot video object segmentation[J]. Science China Information Sciences, 2022, 65(8): 182104. doi: 10.1007/s11432-021-3396-7
    [20]
    CHO S, LEE H, LEE M, et al. Tackling background distraction in video object segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2022: 446-462.
    [21]
    LUITEN J, VOIGTLAENDER P, LEIBE B. PReMVOS: proposal-generation, refinement and merging for video object segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2019: 565-580.
    [22]
    DUKE B, AHMED A, WOLF C, et al. SSTVOS: sparse spatiotemporal Transformers for video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 5908-5917.
    [23]
    GE W B, LU X K, SHEN J B. Video object segmentation using global and instance embedding learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 16831-16840.
    [24]
    LIU Y, YU R, YIN F, et al. Learning quality-aware dynamic memory for video object segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2022: 468-486.
    [25]
    CHENG H K, SCHWING A G. XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2022: 640-658.
    [26]
    LAN M, ZHANG J, HE F X, et al. Siamese network with interactive Transformer for video object segmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2): 1228-1236. doi: 10.1609/aaai.v36i2.20009
    [27]
    CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
    [28]
    陈亚当, 陈柳任, 余文斌, 等. 多尺度特征融合的知识蒸馏异常检测方法[J]. 计算机辅助设计与图形学学报, 2022, 34(10): 1542-1549.

    CHEN Y D, CHEN L R, YU W B, et al. Knowledge distillation anomaly detection with multi-scale feature fusion[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1542-1549(in Chinese).
    [29]
    WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(6)

    Article Metrics

    Article views(364) PDF downloads(10) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return