用于长时视觉跟踪的级联目标漂移判定网络

侯志强; 赵佳鑫; 陈语; 马素刚; 余旺盛; 范九伦

doi:10.13700/j.bh.1001-5965.2023.0504

用于长时视觉跟踪的级联目标漂移判定网络

doi: 10.13700/j.bh.1001-5965.2023.0504

侯志强^{1, 2, ,},
赵佳鑫^{1, 2},
陈语^{1, 2},
马素刚^{1, 2},
余旺盛³,
范九伦¹

1.
西安邮电大学计算机学院，西安 710121
2.
西安邮电大学陕西省网络数据分析与智能处理重点实验室，西安 710121
3.
空军工程大学信息与导航学院，西安 710077

基金项目:

国家自然科学基金(62072370)；陕西省自然科学基金(2023-JC-YB-598)

详细信息

通讯作者:
E-mail：hzq@xupt.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 388
- HTML全文浏览量: 65
- PDF下载量: 12
- 被引次数: 0
出版历程
- 收稿日期: 2023-08-02
- 录用日期: 2023-11-10
- 网络出版日期: 2023-11-23
- 整期出版日期: 2025-07-14

Cascaded object drift determination network for long-term visual tracking

HOU Zhiqiang^{1, 2
, ,},
ZHAO Jiaxin^{1, 2},
CHEN Yu^{1, 2},
MA Sugang^{1, 2},
YU Wangsheng³,
FAN Jiulun¹

1.
School of Computer Science and Technology，Xi’an University of Posts and Telecommunications，Xi’an 710121，China
2.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing，Xi’an University of Posts and Telecommunications，Xi’an 710121，China
3.
College of Information and Navigation，Air Force Engineering University，Xi’an 710077，China

Funds:

National Natural Science Foundation of China (62072370); Natural Science Foundation of Shaanxi Province (2023-JC-YB-598)

More Information

Corresponding author: E-mail：hzq@xupt.edu.cn

摘要

摘要:
针对现有目标漂移判定准则中需要人为选定阈值和判定性能不佳的问题，提出一种自适应选取阈值的级联目标漂移判定网络。通过2个子判定网络的级联设计，判定跟踪结果是否漂移；在所提网络中使用静态模板、长时模板和短时模板联合判定跟踪结果，提高判定的准确性，为使模板适应判定过程中目标的外观变化，设计长短时模板更新策略以保证模板质量；将所提级联目标漂移判定网络联合短时跟踪器TransT与全局重检测方法GlobalTrack，搭建长时视觉跟踪算法TransT_LT。在UAV20L、LaSOT、VOT2018-LT和VOT2020-LT等4个长时视觉跟踪数据集上对所提算法进行性能测试，实验结果表明：所提长时视觉跟踪算法具有优越的长时视觉跟踪性能，特别是在UAV20L数据集上，相较于基准算法，跟踪成功率和精度分别提升了7.7%和10.3%。所提目标漂移判定网络的判定速度为100帧/s，对长时视觉跟踪算法的速度影响不大。
- 长时视觉跟踪 /
- 深度学习 /
- 级联目标漂移判定网络 /
- 模板更新 /
- 多尺度特征融合
Abstract:
Aiming at the problems of artificially selecting the threshold and poor determination performance in the existing object drift determination criteria, this paper proposes a cascaded object drift determination network with adaptive threshold selection. Firstly, through the cascade design of the two sub-networks, determine whether the tracking results drift. The results are then jointly determined by the proposed network using the static template, long-term template, and short-term template. A long-term and short-term template update strategy is then designed to guarantee the quality of the template and adapt it to the object’s changing appearance during the determination process. Finally, the proposed network is combined with the short-term tracker TransT and the global re-detection method GlobalTrack to build a long-term tracking algorithm TransT_LT. The proposed algorithm’s performance test on four datasets (UAV20L, LaSOT, VOT2018-LT, and VOT2020-LT) demonstrates that it performs better over the long term in tracking, particularly on the UAV20L dataset, where it outperforms the benchmark algorithm by 7.7% and 10.3%, respectively, in tracking success rate and accuracy. The determination speed of the proposed network is 100 frames per second, which has little effect on the speed of the long-term tracking algorithm.
- long-term visual tracking /
- deep learning /
- cascaded object drift determination network /
- template updates /
- multi-scale feature fusion

HTML全文

图 1 长时视觉跟踪算法框架

Figure 1. Framework of long-term visual tracking algorithm

下载: 全尺寸图片幻灯片

图 2 级联目标漂移判定网络结构

Figure 2. Structure of cascaded object drift determination network

下载: 全尺寸图片幻灯片

图 3 LFFM模块示意

Figure 3. Schematic diagram of LFFM module

下载: 全尺寸图片幻灯片

图 4 MGFFM模块示意图

Figure 4. Schematic diagram of MGFFM module

下载: 全尺寸图片幻灯片

图 5 部分正负样本

Figure 5. Partial positive and negative sample plots

下载: 全尺寸图片幻灯片

图 6 LaSOT跟踪结果评估曲线

Figure 6. LaSOT tracking result evaluation curves

下载: 全尺寸图片幻灯片

图 7 UAV20L跟踪结果评估曲线及属性的平均成功率对比结果

Figure 7. UAV20L tracking result evaluation curves and average success rate comparison results for attributes

下载: 全尺寸图片幻灯片

图 8 定性分析图

Figure 8. Qualitative analysis diagram

下载: 全尺寸图片幻灯片

图 9 视觉跟踪算法平均速度对比结果

Figure 9. Average speed comparison results of vision tracking algorithm

下载: 全尺寸图片幻灯片

图 10 加入第2级目标漂移判定网络后在尺度变换属性上的定性分析

Figure 10. Qualitative analysis on scale transformation attributes after incorporating the second-level target drift determination network

下载: 全尺寸图片幻灯片

表 1 不同$ \partial $与$ \beta $参数值对长时视觉跟踪性能的影响

Table 1. Impact of different parameter values on long-term visual tracking performance for $ \partial $ and $ \beta $

$ \left( {\partial ,\beta } \right) $	UAV20L		VOT2020-LT
$ \left( {\partial ,\beta } \right) $	成功率	精确度	F分数	精确度	召回率
(0.5,0.6)	0.635	0.826	0.648	0.693	0.608
(0.5,0.7)	0.649	0.839	0.660	0.703	0.623
(0.5,0.8)	0.654	0.862	0.661	0.711	0.618
(0.5,0.9)	0.666	0.866	0.664	0.653	0.675
(0.6,0.7)	0.671	0.870	0.664	0.709	0.625
(0.6,0.8)	0.672	0.872	0.665	0.653	0.677
(0.6,0.9)	0.693	0.898	0.671	0.715	0.632
(0.7,0.8)	0.673	0.870	0.665	0.713	0.624
(0.7,0.9)	0.672	0.875	0.665	0.711	0.622
注：加粗字体表示最优值。

下载: 导出CSV

表 2 长时模板更新时间间隔参数$\delta $对长时视觉跟踪性能的影响

Table 2. Performance impact of long-term template update interval parameter on long-term visual tracking

$ \delta $/帧	UAV20L		VOT2020-LT
$ \delta $/帧	成功率	精确度	F分数	精确度	召回率
50	0.666	0.859	0.665	0.713	0.623
100	0.693	0.898	0.671	0.715	0.632
150	0.689	0.893	0.669	0.711	0.631
200	0.686	0.888	0.664	0.713	0.622
250	0.685	0.886	0.661	0.712	0.618
300	0.672	0.866	0.660	0.715	0.613
注：加粗字体表示最优值。

下载: 导出CSV

表 3 不同判定准则在2个数据集上的最优阈值

Table 3. Optimal thresholds for different determination criteria on two datasets

判定准则	UAV20L	VOT2020-LT
Fmax^[4]	>0.9	>0.95
PSR^[8]	>3	>5
APCE^[9]	<17	<19
RT-MDNet^[10]	/	/
MDNet^[12]	/	/
本文	/	/
注：“/”表示不需要选取阈值。

下载: 导出CSV

表 4 不同判定准则的最优阈值在2个数据集上的性能

Table 4. Performance of the optimal thresholds of different determination criteria on two datasets

判定准则	UAV20L		VOT2020-LT
判定准则	成功率	精确度	F分数	精确度	召回率
Fmax^[4]	0.632	0.817	0.644	0.696	0.599
PSR^[8]	0.639	0.824	0.648	0.704	0.600
APCE^[9]	0.641	0.828	0.651	0.709	0.602
RT-MDNet^[10]	0.624	0.806	0.641	0.712	0.590
MDNet^[12]	0.630	0.813	0.646	0.698	0.601
本文	0.693	0.898	0.671	0.715	0.632
注：加粗字体表示最优值。

下载: 导出CSV

表 5 消融实验结果

Table 5. Ablation experiment results

TransT	TTD	LTUS	CDDN	UAV20L		VOT2020-LT
TransT	TTD	LTUS	CDDN	成功率	精确度	F分数	精确度	召回率
√				0.616	0.795	0.638	0.682	0.599
√	√			0.658	0.857	0.657	0.669	0.645
√	√	√		0.688	0.892	0.668	0.660	0.676
√	√	√	√	0.693	0.898	0.671	0.715	0.632

下载: 导出CSV

表 6 VOT2018-LT数据集上不同视觉跟踪算法的跟踪结果

Table 6. Tracking results of different visual tracking algorithms on VOT2018-LT dataset

算法	F分数	精确度	召回率
TANet^[4]	0.586	0.649	0.535
LTMU^[5]	0.690	0.710	0.672
ELGLT^[6]	0.638	0.669	0.610
LGST^[7]	0.630	0.637	0.622
MBMD^[11]	0.610	0.634	0.588
TransT^[14]	0.670	0.714	0.631
GlobalTrack^[15]	0.555	0.503	0.528
SPLT^[26]	0.616	0.633	0.600
Siam R-CNN^[27]	0.668	0.667	0.675
LTST^[39]	0.636	0.653	0.620
MTTNet^[40]	0.621	0.634	0.609
TransT_LT(本文)	0.693	0.736	0.654

下载: 导出CSV

表 7 VOT2020-LT数据集上不同视觉跟踪算法的跟踪结果

Table 7. Tracking results of different visual tracking algorithms on VOT2020-LT dataset

算法	F分数	精确度	召回率
TANet^[4]	0.515	0.568	0.513
ELGLT^[6]	0.607	0.611	0.560
LGST^[7]	0.578	0.607	0.552
MBMD^[11]	0.575	0.623	0.534
TransT^[14]	0.638	0.682	0.599
GlobalTrack^[15]	0.555	0.503	0.528
SPLT^[26]	0.565	0.587	0.544
Siam R-CNN^[27]	0.670	0.658	0.676
LTST^[39]	0.587	0.631	0.548
RLT-DiMP^[41]	0.670	0.657	0.684
TransT_LT(本文)	0.671	0.715	0.632

下载: 导出CSV

表 8 UAV20L数据集上第2级目标漂移判定网络在尺度变换属性上的实验结果

Table 8. Experimental results of the second-level target drift determination network on scale transformation attributes on UAV20L dataset

有无级联	成功率	精确度
TransT_LT（无级联）	0.688	0.886
TransT_LT（有级联）	0.693	0.892

下载: 导出CSV

参考文献(41)

[1]	李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. doi: 10.11834/jig.190372 LI X, ZHA Y F, ZHANG T Z, et al. Survey of visual object tracking algorithms based on deep learning[J]. Journal of Image and Graphics, 2019, 24(12): 2057-2080(in Chinese). doi: 10.11834/jig.190372
[2]	刘芳, 孙亚楠, 王洪娟, 等. 基于残差学习的自适应无人机目标跟踪算法[J]. 北京亚洲成人在线一二三四五六区学报, 2020, 46(10): 1874-1882. LIU F, SUN Y N, WANG H J, et al. Adaptive UAV target tracking algorithm based on residual learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(10): 1874-1882(in Chinese).
[3]	蒲磊, 冯新喜, 侯志强, 等. 基于级联注意力机制的孪生网络视觉跟踪算法[J]. 北京亚洲成人在线一二三四五六区学报, 2020, 46(12): 2302-2310. PU L, FENG X X, HOU Z Q, et al. Siamese network visual tracking algorithm based on cascaded attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(12): 2302-2310(in Chinese).
[4]	WANG X, TANG J, LUO B, et al. Tracking by joint local and global search: a target-aware attention-based approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6931-6945. doi: 10.1109/TNNLS.2021.3083933
[5]	DAI K N, ZHANG Y H, WANG D, et al. High-performance long-term tracking with meta-updater[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6297-6306.
[6]	ZHAO H J, YAN B, WANG D, et al. Effective local and global search for fast long-term tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 460-474. doi: 10.1109/TPAMI.2022.3153645
[7]	GAO Z, ZHUANG Y, GU J J, et al. A joint local-global search mechanism for long-term tracking with dynamic memory network[J]. Expert Systems with Applications, 2023, 223: 119890. doi: 10.1016/j.eswa.2023.119890
[8]	BOLME D S, BEVERIDGE J R, DRAPER B A, et al. Visual object tracking using adaptive correlation filters[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2010: 2544-2550.
[9]	WANG M M, LIU Y, HUANG Z Y. Large margin object tracking with circulant feature maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4800-4808.
[10]	JUNG I, SON J, BAEK M, et al. Real-time MDNet[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 83-98.
[11]	ZHANG Y, WANG D, WANG L, et al. Learning regression and verification networks for long-term visual tracking[EB/OL]. (2018-11-19)[2023-08-01]. http://arxiv.org/abs/1809.04320v1.
[12]	NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4293-4302.
[13]	HOU Z Q, WANG Z, PU L, et al. Target drift discriminative network based on deep learning in visual tracking[J]. Journal of Electronic Imaging, 2022, 31: 043052.
[14]	CHEN X, YAN B, ZHU J, et al. Transformer tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 8126-8135.
[15]	HUANG L H, ZHAO X, HUANG K Q. GlobalTrack: a simple and strong baseline for long-term tracking[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11037-11044. doi: 10.1609/aaai.v34i07.6758
[16]	MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 445-461.
[17]	FAN H, LIN L T, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5369-5378.
[18]	LUKEŽIČ A, ZAJC L Č, VOJÍŘ T, et al. Now you see me: evaluating performance in long-term visual tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 4991391.
[19]	KRISTAN M, LEONARDIS A, MATAS J, et al. The eighth visual object tracking VOT2020 challenge results[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 547-601.
[20]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[21]	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6230-6239.
[22]	NI J J, WU J H, TONG J, et al. GC-Net: global context network for medical image segmentation[J]. Computer Methods and Programs in Biomedicine, 2020, 190: 105121. doi: 10.1016/j.cmpb.2019.105121
[23]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2023-08-01]. http://arxiv.org/abs/1409.1556.
[24]	DONG Q J, HE X D, GE H Y, et al. Improving model drift for robust object tracking[J]. Multimedia Tools and Applications, 2020, 79(35): 25801-25815.
[25]	WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848. doi: 10.1109/TPAMI.2014.2388226
[26]	YAN B, ZHAO H J, WANG D, et al. ‘skimming-perusal’ tracking: a framework for real-time and robust long-term tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 2385-2393.
[27]	VOIGTLAENDER P, LUITEN J, TORR P H S, et al. Siam R-CNN: visual tracking by re-detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6577-6587.
[28]	MA F, SHOU M Z, ZHU L C, et al. Unified Transformer tracker for object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 8771-8780.
[29]	FU Z H, LIU Q J, FU Z H, et al. STMTrack: template-free visual tracking with space-time memory networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13769-13778.
[30]	BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6181-6190.
[31]	GUO D Y, SHAO Y Y, CUI Y, et al. Graph attention tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 9538-9547.
[32]	LI B, WU W, WANG Q, et al. Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 16-20.
[33]	DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4655-4664.
[34]	WANG N, ZHOU W G, WANG J, et al. Transformer meets tracker: exploiting temporal context for robust visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1571-1580.
[35]	DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4310-4318.
[36]	HARE S, GOLODETZ S, SAFFARI A, et al. Struck: structured output tracking with kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2096-2109. doi: 10.1109/TPAMI.2015.2509974
[37]	DANELLJAN M, HÄGER G, KHAN F S, et al. Discriminative scale space tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561-1575. doi: 10.1109/TPAMI.2016.2609928
[38]	KALAL Z, MIKOLAJCZYK K, MATAS J. Tracking-learning-detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422. doi: 10.1109/TPAMI.2011.239
[39]	YU L, QIAO B J, ZHANG H L, et al. LTST: long-term segmentation tracker with memory attention network[J]. Image and Vision Computing, 2022, 119: 104374. doi: 10.1016/j.imavis.2022.104374
[40]	SANG H, LI G, ZHAO Z. Multi-scale global retrieval and temporal-spatial consistency matching based long-term tracking network[J]. Chinese Journal of Electronics, 2022, 32: 1-11.
[41]	CHOI S, LEE J, LEE Y, et al. Robust long-term object tracking via improved discriminative model prediction[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, , 2020: 602-617.