基于三元组哈希损失的半监督图像检索

邵伟志; 熊思宇; 潘丽丽

doi:10.13700/j.bh.1001-5965.2023.0451

基于三元组哈希损失的半监督图像检索

doi: 10.13700/j.bh.1001-5965.2023.0451

中南林业科技大学计算机与数学学院，长沙 410004

基金项目:

湖南省自然科学基金面上项目(2021JJ31164)；湖南省教育厅科学研究重点项目(22A0195)

详细信息

通讯作者:
E-mail：lily_pan@163.com

中图分类号: TP391
计量
- 文章访问数: 337
- HTML全文浏览量: 69
- PDF下载量: 13
- 被引次数: 0
出版历程
- 收稿日期: 2023-07-10
- 录用日期: 2023-09-14
- 网络出版日期: 2023-10-30
- 整期出版日期: 2025-07-31

Semi-supervised image retrieval based on triplet hash loss

School of Computer and Mathematics，Central South University of Forestry and Technology，Changsha 410004，China

Funds:

The General Program of Natural Science Foundation of Hunan Province (2021JJ31164); The Key Program of Science Research Foundation of Education Department of Hunan Province (22A0195)

More Information

Corresponding author: E-mail：lily_pan@163.com

摘要

摘要:
目前大多数基于深度学习的图像检索方法是在有监督条件下进行的，需要大量的标签数据，但实际应用中获取大量标签数据困难且成本高昂。此外，现有基于欧氏距离的三元组损失计算不够精确，使模型对图像相似性学习的能力欠佳。采用熵最小化伪标签、三元组损失和半监督学习技术，提出了一种新的半监督哈希图像检索模型（SSITL）。应用多阶段模型联合与锐化技术为未标记数据生成伪标签，并通过熵最小化处理以提高伪标签的置信度。同时，利用标记数据和未标记数据的聚类结果选择三元组，并采用基于通道权重矩阵的三元组哈希损失（CWT loss）帮助SSITL学习图像相似性。为了生成更好的哈希码，在2个汉明嵌入间使用MixUp进行混洗得到新的汉明嵌入以改善图像检索性能。实验结果表明：相较于其他方法，SSITL在相仿的时间开销下，在CIFAR-10和NUS-WIDE数据集上的检索平均准确率分别提高了1.2%和0.7%，强有力地验证了SSITL是一种优秀的半监督哈希图像检索模型。
- 图像检索 /
- 三元组哈希损失 /
- 半监督学习 /
- 伪标签 /
- 深度学习
Abstract:
Currently, most of the image retrieval methods based on deep learning are supervised techniques, which require massive labeled data. However, it is very difficult and expensive to label so much data in real applications. Furthermore, the network learned picture similarity poorly since the triple loss functions that were in place were computed using Euclidean distance. In this work, a novel semi-supervised hash image retrieval model (SSITL) is proposed that mixes the pseudo-labels with entropy minimization, triplet hash loss and semi-supervised learning. The multi-stage model union and sharpening technique are used to generate pseudo-labels, and the pseudo-labels are processed with entropy minimization to improve their confidence. The triplet hash loss based on the channel weight matrix is utilized to assist SSITL in learning the similarity of images, while the triples are chosen concurrently depending on the clustering outcomes of labeled and unlabeled data. In order to generate a better hash code, Mix Up is used to shuffle between two Hamming embeddings to obtain a new Hamming embedding for image retrieval. The abundant experimental results show that compared with other methods, SSITL improves the average retrieval accuracy by 1.2% and 0.7% respectively on CIFAR-10 and NUS-WIDE datasets under similar time cost, which strongly demonstrates that SSITL is an excellent semi-supervised hash framework for image retrieval.
- image retrieval /
- triplet hash loss /
- semi-supervised learning /
- pseudo-labels /
- deep learning

HTML全文

图 1 基于三元组哈希损失的半监督图像检索模型

Figure 1. A semi-supervised image retrieval model based on triplet hash loss

下载: 全尺寸图片幻灯片

图 2 伪标签生成流程

Figure 2. Pseudo-label generation process

下载: 全尺寸图片幻灯片

图 3 损失函数计算图示

Figure 3. Graphical representation of loss function calculation

下载: 全尺寸图片幻灯片

图 4 不同数据集上不同标签数据量下的平均精度值

Figure 4. MAP scores for different amounts of labeled data on different datasets

下载: 全尺寸图片幻灯片

图 5 不同数据集上不同哈希码长度下的平均精度值

Figure 5. MAP scores with different hash code lengths on different datasets

下载: 全尺寸图片幻灯片

图 6 不同数据集上的可见类检索

Figure 6. Seen class retrieval on different datasets

下载: 全尺寸图片幻灯片

图 7 不同数据集上的不可见类检索

Figure 7. Unseen class retrieval on different datasets

下载: 全尺寸图片幻灯片

图 8 原图与不同层激活值的可视化对比

Figure 8. Visualization of the original image and activation values of different layers

下载: 全尺寸图片幻灯片

表 1 CIFAR-10数据集上不同方法的平均精度

Table 1. MAP scores of different methods on CIFAR-10

方法	CIFAR-10
方法	12 bits	24 bits	32 bits	48 bits
SSITL	0.818	0.838	0.853	0.854
ABML^[19]	0.815	0.832	0.850	0.851
CPQN^[18]	0.817	0.830	0.848	0.852
BGDH^[30]	0.805	0.824	0.826	0.833
DSH-GAN^[31]	0.751	0.801	0.807	0.811
SSDH^[17]	0.802	0.810	0.816	0.819
DPSH^[27]	0.737	0.775	0.801	0.798
DSDH^[29]	0.738	0.784	0.795	0.818
DRSCH^[28]	0.616	0.625	0.630	0.629
SDH^[25]	0.438	0.520	0.558	0.587
ITQ^[26]	0.219	0.242	0.250	0.252

下载: 导出CSV

表 2 NUS-WIDE数据集上不同方法的平均精度

Table 2. MAP scores of different methods on NUS-WIDE dataset

方法	NUS-WIDE
方法	12 bits	24 bits	32 bits	48 bits
SSITL	0.838	0.857	0.880	0.873
ABML^[19]	0.835	0.851	0.872	0.869
CPQN^[18]	0.833	0.849	0.869	0.870
BGDH^[30]	0.805	0.824	0.826	0.833
DSH-GAN^[31]	0.828	0.843	0.848	0.851
SSDH^[17]	0.803	0.808	0.826	0.833
DPSH^[27]	0.767	0.778	0.795	0.798
DSDH^[29]	0.772	0.804	0.821	0.831
DRSCH^[28]	0.616	0.623	0.627	0.627
SDH^[25]	0.541	0.548	0.579	0.621
ITQ^[26]	0.663	0.700	0.707	0.723

下载: 导出CSV

表 3 CIFAR-10和NUS-WIDE数据集上SSITL不同模型的平均精度分数

Table 3. MAP scores of different SSITLs on CIFAR-10 and NUS-WIDE datasets

方法	CIFAR-10
方法	12 bits	24 bits	32 bits	48 bits
SSITL	0.818	0.838	0.853	0.854
Nmid SSITL	0.781	0.812	0.828	0.831
Nsim SSITL	0.737	0.764	0.785	0.778

方法	NUS-WIDE
方法	12 bits	24 bits	32 bits	48 bits
SSITL	0.838	0.857	0.880	0.873
Nmid SSITL	0.807	0.823	0.844	0.857
Nsim SSITL	0.775	0.788	0.815	0.820

下载: 导出CSV

表 4 CIFAR-10数据集上不同方法对不可见类的检索精度

Table 4. MAP scores of unseen class retrieval with different methods with on CIFAR-10 datasets

方法	CIFAR-10
方法	12 bits	24 bits	32 bits	48 bits
SSITL	0.319	0.327	0.349	0.358
ABML^[19]	0.316	0.325	0.337	0.348
BGDH^[30]	0.267	0.279	0.284	0.294
DSH-GAN^[31]	0.281	0.288	0.299	0.310
SSDH^[17]	0.287	0.291	0.309	0.317
DPSH^[27]	0.264	0.275	0.281	0.293
DSDH^[29]	0.255	0.263	0.278	0.288
DRSCH^[28]	0.217	0.218	0.233	0.251
SDH^[25]	0.190	0.192	0.197	0.206
ITQ^[26]	0.153	0.162	0.192	0.199

下载: 导出CSV

表 5 NUS-WIDE数据集上不同方法对不可见类检索的平均精度

Table 5. MAP scores of unseen class retrieval with different methods on NUS-WIDE datasets

方法	NUS-WIDE
方法	12 bits	24 bits	32 bits	48 bits
SSITL	0.537	0.553	0.581	0.584
ABML^[19]	0.532	0.550	0.574	0.582
BGDH^[30]	0.511	0.529	0.545	0.538
DSH-GAN^[31]	0.508	0.539	0.542	0.541
SSDH^[17]	0.514	0.534	0.538	0.549
DPSH^[27]	0.487	0.512	0.514	0.527
DSDH^[29]	0.255	0.263	0.278	0.288
DRSCH^[28]	0.458	0.463	0.471	0.468
SDH^[25]	0.468	0.489	0.491	0.505
ITQ^[26]	0.490	0.486	0.493	0.507

下载: 导出CSV

表 6 CIFAR-10数据集上使用不同层激活通道的不可见类检索

Table 6. Unseen class retrieval using activation channels of different layers on CIFAR-10 datasets

不同层激活通道	CIFAR-10
不同层激活通道	12 bits	24 bits	32 bits	48 bits
第5层	0.226	0.233	0.251	0.273
第30层	0.286	0.305	0.329	0.343
第100层	0.319	0.327	0.349	0.358
第150层	0.308	0.319	0.346	0.354

下载: 导出CSV

表 7 NUS-WIDE数据集上使用不同层激活通道的不可见类检索

Table 7. Unseen class retrieval using activation channels of different layers on NUS-WIDE datasets

不同层激活通道	NUS-WIDE
不同层激活通道	12 bits	24 bits	32 bits	48 bits
第5层	0.462	0.473	0.481	0.483
第30层	0.504	0.511	0.539	0.542
第100层	0.537	0.553	0.581	0.584
第150层	0.518	0.532	0.574	0.579

下载: 导出CSV

表 8 CIFAR-10数据集上不同权重的不可见类检索

Table 8. Unseen class retrieval using different weight values on CIFAR-10 datasets

权重值	CIFAR-10
权重值	12 bits	24 bits	32 bits	48 bits
${\lambda _2} = 0$	0.275	0.286	0.305	0.311
${\lambda _2} = 0.2$	0.302	0.311	0.332	0.343
${\lambda _2} = 0.5$	0.319	0.327	0.349	0.358
${\lambda _2} = 1$	0.284	0.306	0.329	0.338
${\lambda _2} = 2$	0.265	0.281	0.297	0.301

下载: 导出CSV

表 9 NUS-WIDE数据集上不同权重值的不可见类检索

Table 9. Unseen class retrieval using different weight values on NUS-WIDE datasets

权重值	NUS-WIDE
权重值	12 bits	24 bits	32 bits	48 bits
${\lambda _2} = 0$	0.508	0.517	0.526	0.539
${\lambda _2} = 0.2$	0.519	0.542	0.559	0.571
${\lambda _2} = 0.5$	0.537	0.553	0.581	0.584
${\lambda _2} = 1$	0.513	0.537	0.541	0.564
${\lambda _2} = 2$	0.497	0.511	0.530	0.543

下载: 导出CSV

表 10 不同方法的检索时间对比

Table 10. Retrieval time comparison of different methods ms

方法	CIFAR-10	NUS-WIDE
SSITL	3.58	12.04
SSGAH^[32]	3.51	12.01
SSDH^[17]	3.44	11.63
DSH-GAN^[31]	3.45	12.07
DRSCH^[28]	4.49	12.64
NINH^[3]	4.01	12.04
CNNH^[4]	3.83	12.07

下载: 导出CSV

参考文献(32)

[1]	LI W, DUAN L X, XU D, et al. Text-based image retrieval using progressive multi-instance learning[C]//Proceedings of the 2011 International Conference on Computer Vision. Piscataway: IEEE Press, 2011: 2049-2055.
[2]	LIU Y, ZHANG D S, LU G J, et al. A survey of content-based image retrieval with high-level semantics[J]. Pattern Recognition, 2007, 40(1): 262-282. doi: 10.1016/j.patcog.2006.04.045
[3]	CHEN R Y, PAN L L, LI C, et al. An improved deep fusion CNN for image recognition[J]. Computers, Materials & Continua, 2020, 65(2): 1691-1706.
[4]	LAI H J, PAN Y, YE L, et al. Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3270-3278.
[5]	CHEN Y B, MANCINI M, ZHU X T, et al. Semi-supervised and unsupervised deep visual learning: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(3): 1327-1347. doi: 10.1109/TPAMI.2022.3201576
[6]	刘颖, 程美, 王富平, 等. 深度哈希图像检索方法综述[J]. 中国图象图形学报, 2020, 25(7): 1296-1317. doi: 10.11834/jig.190518 LIU Y, CHENG M, WANG F P, et al. Deep Hashing image retrieval methods[J]. Journal of Image and Graphics, 2020, 25(7): 1296-1317(in Chinese). doi: 10.11834/jig.190518
[7]	ZHU X, GOLDBERG A B. Introduction to semi-supervised learning[J]. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 3(1): 1-130.
[8]	SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 815-823.
[9]	SONG H O, XIANG Y, JEGELKA S, et al. Deep metric learning via lifted structured feature embedding[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4004-4012.
[10]	郑大刚, 刘光杰, 茅耀斌, 等. 基于三元组损失函数的深度人脸哈希方法[J]. 太赫兹科学与电子信息学报, 2021, 19(2): 313-318. doi: 10.11805/TKYDA2018108 ZHENG D G, LIU G J, MAO Y B, et al. Deep face Hashing based on ternary-group loss function[J]. Journal of Terahertz Science and Electronic Information Technology, 2021, 19(2): 313-318(in Chinese). doi: 10.11805/TKYDA2018108
[11]	杜雨佳, 李海生, 姚春莲, 等. 基于三元组网络的单图三维模型检索[J]. 北京亚洲成人在线一二三四五六区学报, 2020, 46(9): 1691-1700. DU Y J, LI H S, YAO C L, et al. Monocular image based 3D model retrieval using triplet network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1691-1700(in Chinese).
[12]	刘晗煜, 黄宏恩, 郑世宝. 基于视角一致性三元组损失的车辆重识别技术[J]. 测控技术, 2021, 40(8): 47-53,63. LIU H Y, HUANG H E, ZHENG S B. View consistency triplet loss for vehicle re-identification[J]. Measurement & Control Technology, 2021, 40(8): 47-53,63 (in Chinese).
[13]	LIAO S C, SHAO L. Graph sampling based deep metric learning for generalizable person re-identification[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 7349-7358.
[14]	YANG S, ZHANG Y F, ZHAO Q H, et al. Prototype-based support example miner and triplet loss for deep metric learning[J]. Electronics, 2023, 12(15): 3315. doi: 10.3390/electronics12153315
[15]	LI Z, KO B, CHOI H J. Naive semi-supervised deep learning using pseudo-label[J]. Peer-to-Peer Networking and Applications, 2019, 12(5): 1358-1368. doi: 10.1007/s12083-018-0702-9
[16]	BERTHELOT D, CARLINI N, GOODFELLOW I, et al. MixMatch: a holistic approach to semi-supervised learning[EB/OL]. (2019-10-23)[2023-05-23]. http://doi.org/10.48550/arXiv.1905.02249.
[17]	ZHANG J, PENG Y X. SSDH: Semi-supervised deep hashing for large scale image retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 29(1): 212-225.
[18]	GUO Z T, HONG C Q, ZHUANG W W, et al. CPQN: central product quantization network for semi-supervised image retrieval[C]//Proceedings of the 2021 IEEE International Conference on Big Data. Piscataway: IEEE Press, 2021: 3183-3190.
[19]	WANG G A, HU Q H, YANG Y, et al. Adversarial binary mutual learning for semi-supervised deep hashing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 4110-4124. doi: 10.1109/TNNLS.2021.3055834
[20]	魏翔, 王靖杰, 张顺利, 等. ReLSL: 基于可靠标签选择与学习的半监督学习算法[J]. 计算机学报, 2022, 45(6): 1147-1160. doi: 10.11897/SP.J.1016.2022.01147 WEI X, WANG J J, ZHANG S L, et al. ReLSL: reliable label selection and learning based algorithm for semi-supervised learning[J]. Chinese Journal of Computers, 2022, 45(6): 1147-1160(in Chinese). doi: 10.11897/SP.J.1016.2022.01147
[21]	ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. (2018-04-27)[2023-05-25]. http://doi.org/10.48550/arXiv.1710.09412.
[22]	WANG G A, HU Q H, YANG Y, et al. Adversarial binary mutual learning for semi-supervised deep hashing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(8): 4110-4124.
[23]	KRIZHEVSKY A, HINTON G. Convolutional deep belief networks on cifar-10[J]. Unpublished Manuscript, 2010, 40(7): 1-9.
[24]	CHUA T S, TANG J H, HONG R C, et al. NUS-WIDE: a real-world web image database from National University of Singapore[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. New York: ACM, 2009: 1-9.
[25]	SHEN F M, SHEN C H, LIU W, et al. Supervised discrete hashing[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 37-45.
[26]	GONG Y C, LAZEBNIK S, GORDO A, et al. Iterative quantization: a Procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(12): 2916-2929.
[27]	LI W J, WANG S, KANG W C. Feature learning based deep supervised hashing with pairwise labels[EB/OL]. (2016-04-21)[2023-05-27]. http://doi.org/10.48550/arXiv.1511.03855.
[28]	ZHANG R M, LIN L, ZHANG R, et al. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification[J]. IEEE Transactions on Image Processing, 2015, 24(12): 4766-4779. doi: 10.1109/TIP.2015.2467315
[29]	LI Q, SUN Z, HE R, et al. Deep supervised discrete hashing[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates, 2017: 2479-2488.
[30]	YAN X, ZHANG L, LI W J. Semi-supervised deep Hashing with a bipartite graph[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne: IJCAJ, 2017: 3238-3244.
[31]	QIU Z F, PAN Y W, YAO T, et al. Deep semantic hashing with generative adversarial networks[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2017: 225-234.
[32]	WANG G A, HU Q H, CHENG J, et al. Semi-supervised generative adversarial hashing for image retrieval[C]// Computer Vision – ECCV 2018. Berlin: Springer, 2018: 491-507.