留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于语义信息引导的多标签图像分类

黄俊 范浩东 洪旭东 李雪

黄俊,范浩东,洪旭东,等. 基于语义信息引导的多标签图像分类[J]. 北京亚洲成人在线一二三四五六区学报,2025,51(7):2271-2281 doi: 10.13700/j.bh.1001-5965.2023.0382
引用本文: 黄俊,范浩东,洪旭东,等. 基于语义信息引导的多标签图像分类[J]. 北京亚洲成人在线一二三四五六区学报,2025,51(7):2271-2281 doi: 10.13700/j.bh.1001-5965.2023.0382
HUANG J,FAN H D,HONG X D,et al. Semantic information-guided multi-label image classification[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2271-2281 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0382
Citation: HUANG J,FAN H D,HONG X D,et al. Semantic information-guided multi-label image classification[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2271-2281 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0382

基于语义信息引导的多标签图像分类

doi: 10.13700/j.bh.1001-5965.2023.0382
基金项目: 

国家自然科学基金(61806005);安徽高校协同创新项目(GXXT-2022-052);安徽省高校优秀青年人才支持计划项目(gxyqZD2022032);安徽省高校科学研究重点项目(KJ2021A0373);多模态认知计算安徽省重点实验室(安徽大学)开放基金(MMC202008);安徽省自然科学基金(2008085QF305)

详细信息
    通讯作者:

    E-mail:huangjun.cs@ahut.edu.cn

  • 中图分类号: TP37;TP183;TP181

Semantic information-guided multi-label image classification

Funds: 

National Natural Science Foundation of China (61806005); The University Synergy Innovation Program of Anhui Province (GXXT-2022-052); Outstanding Young Talents Support Program of Anhui Province (gxyqZD2022032);Natural Science Foundation of the Educational Commission of Anhui Province (KJ2021A0373); Open Project of Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University (MMC202008); Natural Science Foundation of Anhui Province (2008085QF305)

More Information
  • 摘要:

    多标签图像分类旨在为给定的输入图像预测一组标签,基于语义信息的研究主要利用语义和视觉空间的相关性指导特征提取过程生成有效的特征表示,或利用语义和标签空间的相关性学习能够捕获标签相关性的加权分类器,未能同时建模语义、视觉和标签空间相关性。针对该问题,提出一种基于语义信息引导的多标签图像分类 (SIG-MLIC)方法,SIG-MLIC方法可以同时利用语义、视觉和标签空间,通过语义引导的注意力(SGA)机制增强标签与图像区域的关联性而生成语义特定的特征表示,同时利用标签的语义信息生成一个具有标签相关性约束的语义字典对视觉特征进行重建,获得归一化的表示系数作为标签出现的概率。在3个标准的多标签图像分类数据集上的实验结果表明:SIG-MLIC方法中的注意力机制和字典学习可以有效提高分类性能,验证了所提方法的有效性。

     

  • 图 1  基于语义信息引导的多标签图像分类框架

    Figure 1.  SIG-MLIC framework

    图 2  语义引导的注意力模块示意图

    Figure 2.  Schematic diagram of SGA module

    图 3  语义特征注意图可视化示例

    Figure 3.  Visualization examples of semantic feature attention maps

    图 4  在Pascal VOC 2007数据集上不同$ \alpha $、$ \beta $和$ \delta $的比较

    Figure 4.  Comparison of different $ \alpha $, $ \beta $ and $ \delta $ values on Pascal VOC 2007 dataset

    表  1  在Pascal VOC 2007数据集上不同方法的比较

    Table  1.   Comparison with different methods on Pascal VOC 2007 dataset %

    方法 AP mAP
    aero bike bird boat bottle bus car cat chair cow table dog horse motor person plant sheep sofa train TV
    CPCL[6] 99.6 98.6 98.5 98.8 81.9 95.1 97.8 98.2 83.0 95.5 85.5 98.4 98.5 97.0 99.0 86.6 97.0 84.9 99.1 94.3 94.4
    SSGRL[7]* 99.5 97.1 97.6 97.8 82.6 94.8 96.7 98.1 78.0 97.0 85.6 97.8 98.3 96.4 98.8 84.9 96.5 79.8 98.4 92.8 93.4
    SSGRL(pre)[7]* 99.7 98.4 98.0 97.6 85.7 96.2 98.2 98.8 82.0 98.1 89.7 98.8 98.7 97.0 99.0 86.9 98.1 85.8 99.0 93.7 95.0
    MSRN[19] 100.0 98.8 98.9 99.1 81.6 95.5 98.0 98.2 84.4 96.6 87.5 98.6 98.6 97.2 99.1 87.0 97.6 86.5 99.4 94.4 94.9
    MSRN(pre)[19] 99.7 98.9 98.7 99.1 86.6 97.9 98.5 98.9 86.0 98.7 89.1 99.0 99.1 97.3 99.2 90.2 99.2 89.7 99.8 95.3 96.0
    ML-GCN[8] 99.5 98.5 98.6 98.1 80.8 94.6 97.2 98.2 82.3 95.7 86.4 98.2 98.4 96.7 99.0 84.7 96.7 84.3 98.9 93.7 94.0
    DSDL[29] 99.8 98.7 98.4 97.9 81.9 95.4 97.6 98.3 83.3 95.0 88.6 98.0 97.9 95.8 99.0 86.6 95.9 86.4 98.6 94.4 94.4
    MCAR[21] 99.7 99.0 98.5 98.2 85.4 96.9 97.4 98.9 83.7 95.0 88.8 99.1 98.2 95.1 99.1 84.8 97.1 87.8 98.3 94.8 94.8
    SIG-MLIC 99.9 98.6 98.9 98.3 83.6 98.2 97.9 99.1 82.7 97.5 89.6 99.3 99.1 98.8 99.1 86.9 99.5 86.6 99.4 97.2 95.5
    SIG-MLIC(pre) 100.0 99.1 99.4 98.7 88.2 98.2 98.8 99.6 85.0 98.8 92.0 99.7 99.5 99.3 99.3 88.8 99.7 89.4 99.6 97.2 96.5
     注:粗体数字为最优值,下划线数字为次优值,“pre”表示模型在MS-COCO数据集上进行了预训练,*表示输入图片像素大小为576×576。
    下载: 导出CSV

    表  2  在Pascal VOC 2012数据集上不同方法的比较

    Table  2.   Comparison with different methods on Pascal VOC 2012 dataset %

    方法 AP mAP
    aero bike bird boat bottle bus car cat chair cow table dog horse motor person plant sheep sofa train TV
    FeV+LV[35]98.492.893.490.774.993.290.296.178.289.880.695.796.195.397.573.191.275.497.088.289.4
    RCP[36]99.392.297.594.982.394.192.498.583.893.583.198.197.396.098.877.795.179.497.792.492.2
    RMIC[37]98.085.592.688.764.086.882.094.972.783.173.495.291.790.895.558.387.670.693.883.084.4
    SSGRL[7]*99.595.197.496.485.894.593.798.986.796.384.698.998.696.298.782.298.284.298.193.593.9
    SSGRL(pre)[7]*99.796.197.796.586.995.895.098.988.397.687.499.199.297.399.084.898.385.899.294.194.8
    MSRN[19]99.795.298.396.384.896.593.399.687.496.086.398.998.396.998.880.697.780.599.494.794.0
    MSRN(pre)[19]99.896.398.496.885.297.595.299.688.096.689.899.098.896.899.084.597.185.899.395.995.0
    DSDL[29]99.495.397.695.783.594.893.998.585.794.583.898.497.795.998.580.695.782.398.293.293.2
    MCAR[21]99.697.198.396.687.095.594.498.887.096.985.098.798.397.399.083.896.883.798.393.594.3
    SIG-MLIC99.996.498.497.386.596.595.499.387.998.085.098.999.297.998.585.598.583.799.395.094.8
    SIG-MLIC(pre)100.097.898.898.188.997.596.099.589.798.888.199.399.798.498.988.498.787.099.696.796.0
     注:粗体数字为每列最优值,下划线数字为每列次优值,“pre”表示模型在MS-COCO数据集上进行了预训练,*表示输入图片大小为576×576。
    下载: 导出CSV

    表  3  在MS-COCO数据集上不同方法的比较

    Table  3.   Comparison with different methods on MS-COCO dataset %

    方法 mAP CP CR CF1 OP OR OF1
    得分
    前3标签
    所有
    标签
    得分
    前3标签
    所有
    标签
    得分
    前3标签
    所有
    标签
    得分
    前3标签
    所有
    标签
    得分
    前3标签
    所有
    标签
    得分
    前3标签
    所有
    标签
    CPCL[6]82.889.085.663.571.174.177.690.586.165.974.576.379.9
    SSGRL[7]*83.891.989.962.568.572.776.893.891.364.170.876.279.7
    MSRN[19]83.484.586.572.971.578.378.384.386.176.875.580.480.4
    ML-GCN[8]83.089.285.164.172.074.678.090.585.866.575.476.780.3
    DSDL[29]81.788.184.162.970.473.476.789.685.165.373.975.679.1
    MCAR[21]83.888.185.065.572.175.178.091.088.066.373.976.780.3
    SIG-MLIC85.690.286.466.675.176.680.391.987.767.977.578.182.3
     注:下划线数字为每列次优值。
    下载: 导出CSV

    表  4  在Pascal VOC 2007数据集上的消融实验

    Table  4.   Ablation experiment on Pascal VOC 2007 dataset

    方案 名称 mAP/%
    1ResNet-10192.6
    2SIG-MLIC w/o D93.4
    3SIG-MLIC w/o SGA_195.1
    4SIG-MLIC w/o SGA_289.5
    本文方法SIG-MLIC95.5
     注:粗体数字为最优值。
    下载: 导出CSV
  • [1] GE Z Y, JIANG X H, TONG Z, et al. Multi-label correlation guided feature fusion network for abnormal ECG diagnosis[J]. Knowledge-Based Systems, 2021, 233: 107508. doi: 10.1016/j.knosys.2021.107508
    [2] CHEN L, LI Z D, ZENG T, et al. Predicting gene phenotype by multi-label multi-class model based on essential functional features[J]. Molecular Genetics and Genomics, 2021, 296(4): 905-918. doi: 10.1007/s00438-021-01789-8
    [3] DING X M, LI B, XIONG W H, et al. Multi-instance multi-label learning combining hierarchical context and its application to image annotation[J]. IEEE Transactions on Multimedia, 2016, 18(8): 1616-1627. doi: 10.1109/TMM.2016.2572000
    [4] LIU S L, ZHANG L, YANG X, et al. Query2Label: a simple trans- former way to multi-label classification[EB/OL]. (2021-07-22) [2023-06-02]. http://arxiv.org/abs/2107.10834.
    [5] WANG J, YANG Y, MAO J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2285-2294.
    [6] ZHOU F T, HUANG S, LIU B, et al. Multi-label image classification via category prototype compositional learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(7): 4513-4525. doi: 10.1109/TCSVT.2021.3128054
    [7] CHEN T S, XU M X, HUI X L, et al. Learning semantic-specific graph representation for multi-label image recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 522-531.
    [8] CHEN Z M, WEI X S, WANG P, et al. Multi-label image recognition with graph convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5177-5186.
    [9] LI Q, PENG X J, QIAO Y, et al. Learning label correlations for multi-label image recognition with graph networks[J]. Pattern Recognition Letters, 2020, 138: 378-384. doi: 10.1016/j.patrec.2020.07.040
    [10] WANG Y T, XIE Y Z, LIU Y, et al. Fast graph convolution network based multi-label image recognition via cross-modal fusion[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM, 2020: 1575-1584.
    [11] CHEN B Z, LI J X, LU G M, et al. Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification[J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24(8): 2292-2302. doi: 10.1109/JBHI.2020.2967084
    [12] NIU S J, XU Q, ZHU P F, et al. Coupled dictionary learning for multi-label embedding[C]//Proceedings of the International Joint Conference on Neural Networks. Piscataway: IEEE Press, 2019: 1-8.
    [13] ZHENG J Y, ZHU W C, ZHU P F. Multi-label quadruplet dictionary learning[C]//Proceedings of the Artificial Neural Networks and Machine Learning. Berlin: Springer, 2020: 119-131.
    [14] JING X Y, WU F, LI Z Q, et al. Multi-label dictionary learning for image annotation[J]. IEEE Transactions on Image Processing, 2016, 25(6): 2712-2725. doi: 10.1109/TIP.2016.2549459
    [15] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. doi: 10.1007/s11263-009-0275-4
    [16] LIN T-Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the European conference on computer vision. Berlin: Springer, 2014: 740-755.
    [17] ZHU Y, KWOK J T, ZHOU Z H. Multi-label learning with global and local label correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6): 1081-1094. doi: 10.1109/TKDE.2017.2785795
    [18] WENG W, WEI B W, KE W, et al. Learning label-specific features with global and local label correlation for multi-label classification[J]. Applied Intelligence, 2023, 53(3): 3017-3033. doi: 10.1007/s10489-022-03386-7
    [19] QU X W, CHE H, HUANG J, et al. Multi-layered semantic representation network for multi-label image classification[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(10): 3427-3435.
    [20] CHEN T S, WANG Z X, LI G B, et al. Recurrent attentional rein- forcement learning for multi-label image recognition[C]//Proceeding of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017, 6730-6737.
    [21] GAO B B, ZHOU H Y. Learning to discover multi-class attentional regions for multi-label image recognition[J]. IEEE Transactions on Image Processing, 2021, 30: 5920-5932. doi: 10.1109/TIP.2021.3088605
    [22] WANG Y T, XIE Y Z, ZENG J F, et al. Cross-modal fusion for multi-label image classification with attention mechanism[J]. Computers and Electrical Engineering, 2022, 101: 108002. doi: 10.1016/j.compeleceng.2022.108002
    [23] WU T, HUANG Q Q, LIU Z W, et al. Distribution-balanced loss for multi-label classification in long-tailed datasets[C]//Proceeding of the European Conference on Computer Vision. Berlin: Springer, 2020: 162-178.
    [24] RIDNIK T, BEN-BARUCH E, ZAMIR N, et al. Asymmetric loss for multi-label classification[C]//Proceedings of the IEEE/CVF Inter-national Conference on Computer Vision. Piscataway: IEEE Press, 2021: 82-91.
    [25] DONG J X. Focal loss improves the model performance on multi-label image classifications with imbalanced data[C]//Proceedings of the 2nd International Conference on Industrial Control Network and System Engineering Research. New York: ACM, 2020: 18-21.
    [26] CAO X C, ZHANG H, GUO X J, et al. SLED: semantic label embedding dictionary representation for multilabel image annotation[J]. IEEE Transactions on Image Processing, 2015, 24(9): 2746-2759. doi: 10.1109/TIP.2015.2428055
    [27] ZHAO D D, YI M H, GUO J X, et al. A novel image classification method based on multi-layer dictionary learning[C]//Proceedings of the CAA Symposium on Fault Detection, Supervision, and Safety for Technical Processes. Piscataway: IEEE Press, 2021: 1-6.
    [28] OU L, HE Y, LIAO S L, et al. FaceIDP: face identification differential privacy via dictionary learning neural networks[J]. IEEE Access, 2023, 11: 31829-31841. doi: 10.1109/ACCESS.2023.3260260
    [29] ZHOU F T, HUANG S, XING Y. Deep semantic dictionary learning for multi-label image classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3572-3580.
    [30] HUANG S, LIN J K, HUANGFU L W. Class-prototype discriminative network for generalized zero-shot learning[J]. IEEE Signal Processing Letters, 2020, 27: 301-305. doi: 10.1109/LSP.2020.2968213
    [31] XING C, ROSTAMZADEH N, ORESHKIN B N, et al. Adaptive cross-modal few-shot learning[C]//Proceedings of the Annual Conference on Neural Information Processing Systems. La Jolla: NIPS, 2019: 4848-4858.
    [32] HE X T, PENG Y X. Fine-grained image classification via combining vision and language[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 7332-7340.
    [33] PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543.
    [34] WANG Z, FANG Z L, LI D D, et al. Semantic supplementary network with prior information for multi-label image classification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(4): 1848-1859. doi: 10.1109/TCSVT.2021.3083978
    [35] YANG H, ZHOU J T, ZHANG Y, et al. Exploit bounding box annotations for multi-label object recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 280-288.
    [36] WANG M, LUO C Z, HONG R C, et al. Beyond object proposals: random crop pooling for multi-label image recognition[J]. IEEE Transactions on Image Processing, 2016, 25(12): 5678-5688. doi: 10.1109/TIP.2016.2612829
    [37] HE S, XU C, GUO T, et al. Reinforced multi-label image classifi- cation by exploring curriculum[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 3183-3190.
  • 加载中
图(4) / 表(4)
计量
  • 文章访问数:  419
  • HTML全文浏览量:  97
  • PDF下载量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-06-16
  • 录用日期:  2023-09-15
  • 网络出版日期:  2023-11-08
  • 整期出版日期:  2025-07-31

目录

    /

    返回文章
    返回
    常见问答