留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于差异性特征蒸馏的多模态连续学习方法

贺驰原 程少旭 许林峰 孟凡满 吴庆波

贺驰原,程少旭,许林峰,等. 基于差异性特征蒸馏的多模态连续学习方法[J]. 北京亚洲成人在线一二三四五六区学报,2025,51(7):2460-2467 doi: 10.13700/j.bh.1001-5965.2023.0369
引用本文: 贺驰原,程少旭,许林峰,等. 基于差异性特征蒸馏的多模态连续学习方法[J]. 北京亚洲成人在线一二三四五六区学报,2025,51(7):2460-2467 doi: 10.13700/j.bh.1001-5965.2023.0369
HE C Y,CHENG S X,XU L F,et al. Continual learning method based on differential feature distillation for multimodal network[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2460-2467 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0369
Citation: HE C Y,CHENG S X,XU L F,et al. Continual learning method based on differential feature distillation for multimodal network[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2460-2467 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0369

基于差异性特征蒸馏的多模态连续学习方法

doi: 10.13700/j.bh.1001-5965.2023.0369
基金项目: 

新一代人工智能国家科技重大专项(2021ZD0112001);国家自然科学基金(62071086);四川省自然科学基金创新研究群体项目(2023NSFSC1972)

详细信息
    通讯作者:

    E-mail:lfxu@uestc.edu.cn

  • 中图分类号: TP391

Continual learning method based on differential feature distillation for multimodal network

Funds: 

National Science and Technology Major Project (2021ZD0112001); National Natural Science Foundation of China (62071086); Natural Science Foundation of Sichuan Province (2023NSFSC1972)

More Information
  • 摘要:

    近年来连续学习成为一个新的研究热点,但在多模态架构的连续学习任务中,数据不能被完全利用,导致了严重的灾难性遗忘和学习受阻问题。因此,提出了基于特征蒸馏的多模态连续学习方法。该方法重点考虑不同模态在任务表现方面的差异性,选择较多或较少地保留模态旧知识,以激发各模态从整体角度挖掘具有判别性特征的潜力。在多模态行为识别数据集UESTC-MMEA-CL上的实验验证了所提方法的有效性。在进行到第8个任务时,所提方法的平均准确率在微调基础上提升了22.0%,在不遗忘学习(LwF)的基础上提升了20.1%。与经典的知识蒸馏方法相比,提出的差异性特征蒸馏方法显著提高了传感器模态的利用率,从而更显著地缓解了多模态网络的灾难性遗忘问题。

     

  • 图 1  多模态连续行为识别示意图

    Figure 1.  Multimodal continual behavior recognition

    图 2  差异性特征蒸馏示意图

    Figure 2.  Diagram of differential feature distillation

    图 3  遗忘曲线对比

    Figure 3.  Comparison of forgetting curves

    图 4  初始任务的遗忘情况对比

    Figure 4.  Comparison of forgetting status of initial task

    图 5  模态利用率对比

    Figure 5.  Modality utilization comparison

    表  1  网络参数设置

    Table  1.   Network parameter setting

    模态 输入维度 特征提取器 融合器 分类器
    视频图像 [8,3,224,224] BN-Inception[19] Concat+
    FC+
    ReLU
    FC+
    Softmax
    加速度计 [8,3,64] DeepConvLSTM[20]
    陀螺仪 [8,3,64] DeepConvLSTM[20]
     注:FC为全连接层。
    下载: 导出CSV

    表  2  消融对比

    Table  2.   Ablation comparison

    模型+连续学习策略 平均准确率/%
    8×4 4×8
    视频图像单模态+LwF 29.3 53.2
    三模态+LwF 25.8 39.0
    三模态+LwF+特征蒸馏 30.6 52.5
    三模态+LwF+差异性特征蒸馏 46.3 58.9
     注:“8×4”代表将全32个行为类别分成8个任务,每个任务中有4个类别;“4×8”代表将全32个行为类别分成4个任务,每个任务中有8个类别。
    下载: 导出CSV

    表  3  融合方法对比

    Table  3.   Fusion method comparison

    模型+连续学习策略 准确率/%
    32类 8×4
    三模态-MAFnet[22] 95.4 12.2
    三模态-MM-GCN[23] 96.0 16.1
    三模态-Concat + FC + ReLU 95.6 58.9
    下载: 导出CSV

    表  4  去偏系数的有效性(三模态)

    Table  4.   Effectiveness of debiasing coefficient (triple modalities)

    累加方式 准确率/%
    任务1 任务2 任务3 任务4
    直接累加 98.3 82.5 62.2 53.0
    去偏累加 98.3 82.5 68.2(+6.0) 58.9(+5.9)
     注:括号内数值为相对于“直接累加”的准确率相对增益。
    下载: 导出CSV
  • [1] XU L F, WU Q B, PAN L L, et al. Towards continual egocentric activity recognition: a multi-modal egocentric activity dataset for continual learning[EB/OL]. (2023-01-26)[2023-06-01]. http://arxiv.org/abs/2301.10931v1.
    [2] KAZAKOS E, NAGRANI A, ZISSERMAN A, et al. EPIC-fusion: audio-visual temporal binding for egocentric action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 5491-5500.
    [3] DAMEN D M, DOUGHTY H, FARINELLA G M, et al. Rescaling egocentric vision: collection, pipeline and challenges for EPIC-KITCHENS-100[J]. International Journal of Computer Vision, 2022, 130(1): 33-55. doi: 10.1007/s11263-021-01531-2
    [4] SPRIGGS E H, DE LA TORRE F, HEBERT M. Temporal segmentation and activity classification from first-person sensing[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2009: 17-24.
    [5] CHEN C, JAFARI R, KEHTARNAVAZ N. UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2015: 168-172.
    [6] SONG S B, CHANDRASEKHAR V, MANDAL B, et al. Multimodal multi-stream deep learning for egocentric activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2016: 378-385.
    [7] NAKAMURA K, YEUNG S, ALAHI A, et al. Jointly learning energy expenditures and activities using egocentric multimodal signals[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6817-6826.
    [8] KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2017, 114(13): 3521-3526.
    [9] LIU X L, MASANA M, HERRANZ L, et al. Rotate your networks: better weight consolidation and less catastrophic forgetting[C]//Proceedings of the 24th International Conference on Pattern Recognition. Piscataway: IEEE Press, 2018: 2262-2268.
    [10] ZENKE F, POOLE B, GANGULI S. Continual learning through synaptic intelligence[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney: PMLR, 2017, 70: 3987-3995.
    [11] LI Z Z, HOIEM D. Learning without forgetting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(12): 2935-2947. doi: 10.1109/TPAMI.2017.2773081
    [12] REBUFFI S A, KOLESNIKOV A, SPERL G, et al. iCaRL: incremental classifier and representation learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5533-5542.
    [13] DOUILLARD A, CORD M, OLLION C, et al. PODNet: pooled outputs distillation for small-tasks incremental learning[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 86-102.
    [14] KANG M, PARK J, HAN B. Class-incremental learning by knowledge distillation with adaptive feature consolidation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 16050-16059.
    [15] YAN S P, XIE J W, HE X M. DER: dynamically expandable representation for class incremental learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, IEEE Press, 2021: 3013-3022.
    [16] ZHOU D W, WANG Q W, YE H J, et al. A model or 603 exemplars: towards memory-efficient class-incremental learning[C]//Proceedings of the 11th International Conference on Learning Representations. Kigali: ICLR, 2023.
    [17] WANG F Y, ZHOU D W, YE H J, et al. FOSTER: feature boosting and compression for class-incremental learning[C]//European Conference on Computer Vision. Berlin: Springer, 2022: 398-414.
    [18] ZHU K, ZHAI W, CAO Y, et al. Self-sustaining representation expansion for non-exemplar class-incremental learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 9286-9295.
    [19] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille: PMLR, 2015, 37: 448-456.
    [20] ORDÓÑEZ F J, ROGGEN D. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition[J]. Sensors, 2016, 16(1): 115. doi: 10.3390/s16010115
    [21] QIAN N. On the momentum term in gradient descent learning algorithms[J]. Neural Networks, 1999, 12(1): 145-151. doi: 10.1016/S0893-6080(98)00116-6
    [22] BROUSMICHE M, ROUAT J, DUPONT S. Multi-level attention fusion network for audio-visual event recognition[EB/OL]. (2021-06-12)[2023-06-01]. http://arxiv.org/abs/2106.06736v1.
    [23] SHI Z S, LIANG J, LI Q Q, et al. Multi-modal multi-action video recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 13658-13667.
  • 加载中
图(5) / 表(4)
计量
  • 文章访问数:  501
  • HTML全文浏览量:  82
  • PDF下载量:  14
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-06-15
  • 录用日期:  2023-12-01
  • 网络出版日期:  2024-02-26
  • 整期出版日期:  2025-07-31

目录

    /

    返回文章
    返回
    常见问答