Deep reinforcement learning intelligent guidance for intercepting high maneuvering targets

ZHANG Hao; ZHU Jianwen; LI Xiaoping; BAO Weimin

doi:10.13700/j.bh.1001-5965.2023.0375

Volume 51 Issue 6

Jun. 2025

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2025 > 51(6): 2060-2069.

ZHANG H，ZHU J W，LI X P，et al. Deep reinforcement learning intelligent guidance for intercepting high maneuvering targets[J]. Journal of Beijing University of Aeronautics and Astronautics，2025，51（6）：2060-2069 （in Chinese） doi: 10.13700/j.bh.1001-5965.2023.0375

Citation:

PDF( 1401 KB)

Deep reinforcement learning intelligent guidance for intercepting high maneuvering targets

doi: 10.13700/j.bh.1001-5965.2023.0375

ZHANG Hao¹,
ZHU Jianwen^{1, 2
,
,},
LI Xiaoping¹,
BAO Weimin^{1, 3}

1.
School of Aerospace Science and Technology，Xidian University，Xi’an 710126，China
2.
College of Missile Engineering，Rocket Force University of Engineering，Xi’an 710025，China
3.
China Aerospace Science and Technology Corporation，Beijing 100048，China

More Information

Corresponding author: E-mail：zhujianwen1117@163.com
Received Date: 15 Jun 2023
Accepted Date: 11 Sep 2023

Available Online: 12 Oct 2023

Publish Date: 27 Sep 2023

Abstract

Abstract

An adaptive proportional navigation law with intelligent parameter adjustment by deep reinforcement learning is presented to address the issue of excessive miss distances and energy loss in the interception of moving targets using fixed coefficient proportional navigation law. First, a state space based on real-time flight states, an action space containing lateral and vertical gains, and a reward function model integrating different states is established. Meanwhile, a prediction-correction method is introduced to improve the accuracy of action evaluation in the model design of the reward function. Secondly, the soft actor-critic (SAC) algorithm is employed to train a network parameter and guidance parameter decision system that takes into account the miss distances and energy loss according to the relative motion states of the interceptor and the target. In comparison to traditional proportional navigation guidance, the simulation results demonstrate that the guidance technique has strong adaptability and can greatly minimize energy loss while retaining low miss distances.
- intelligent guidance,
- maneuvering targets,
- deep reinforcement learning,
- proportional navigation,
- Markov decision process

FullText(HTML)

References(18)

References

[1]	纪毅, 王伟, 张宏岩, 等. 面向高机动目标拦截任务的空空导弹制导方法综述[J]. 航空兵器, 2022, 29(6): 15-25. doi: 10.12132/ISSN.1673-5048.2022.0119 JI Y, WANG W, ZHANG H Y, et al. A survey on guidance method of air-to-air missiles facing high maneuvering targets[J]. Aero Weaponry, 2022, 29(6): 15-25(in Chinese). doi: 10.12132/ISSN.1673-5048.2022.0119
[2]	JEON I S, KARPENKO M, LEE J I. Connections between proportional navigation and terminal velocity maximization guidance[J]. Journal of Guidance, Control, and Dynamics, 2020, 43(2): 383-388. doi: 10.2514/1.G004672
[3]	ZHANG B L, ZHOU D. Optimal predictive sliding-mode guidance law for intercepting near-space hypersonic maneuvering target[J]. Chinese Journal of Aeronautics, 2022, 35(4): 320-331. doi: 10.1016/j.cja.2021.05.021
[4]	LI C D, WANG J, HE S M, et al. Collision-geometry-based generalized optimal impact angle guidance for various missile and target motions[J]. Aerospace Science and Technology, 2020, 106: 106204. doi: 10.1016/j.ast.2020.106204
[5]	YE D, SHI M M, SUN Z W. Satellite proximate interception vector guidance based on differential games[J]. Chinese Journal of Aeronautics, 2018, 31(6): 1352-1361. doi: 10.1016/j.cja.2018.03.012
[6]	郭建国, 胡冠杰, 郭宗易, 等. 天线罩误差下基于ADP的机动目标拦截制导策略[J]. 宇航学报, 2022, 43(7): 911-920. doi: 10.3873/j.issn.1000-1328.2022.07.007 GUO J G, HU G J, GUO Z Y, et al. ADP-based guidance strategy for maneuvering target interception under radome errors[J]. Journal of Astronautics, 2022, 43(7): 911-920(in Chinese). doi: 10.3873/j.issn.1000-1328.2022.07.007
[7]	白志会, 黎克波, 苏文山, 等. 现实真比例导引拦截任意机动目标捕获区域[J]. 航空学报, 2020, 41(8): 323947. BAI Z H, LI K B, SU W S, et al. Capture region of RTPN guidance law against arbitrarily maneuvering targets[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(8): 323947(in Chinese).
[8]	王荣刚, 唐硕. 拦截高速运动目标广义相对偏置比例制导律[J]. 西北工业大学学报, 2019, 37(4): 682-690. doi: 10.3969/j.issn.1000-2758.2019.04.006 WANG R G, TANG S. Intercepting higher-speed targets using generalized relative biased proportional navigation[J]. Journal of Northwestern Polytechnical University, 2019, 37(4): 682-690(in Chinese). doi: 10.3969/j.issn.1000-2758.2019.04.006
[9]	LI K B, SHIN H S, TSOURDOS A, et al. Capturability of 3D PPN against lower-speed maneuvering target for homing phase[J]. IEEE Transactions on Aerospace and Electronic Systems, 2020, 56(1): 711-722. doi: 10.1109/TAES.2019.2938601
[10]	SHIN H S, LI K B. An improvement in three-dimensional pure proportional navigation guidance[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(5): 3004-3014. doi: 10.1109/TAES.2021.3067656
[11]	张秦浩, 敖百强, 张秦雪. Q-learning强化学习制导律[J]. 系统工程与电子技术, 2020, 42(2): 414-419. doi: 10.3969/j.issn.1001-506X.2020.02.21 ZHANG Q H, AO B Q, ZHANG Q X. Reinforcement learning guidance law of Q-learning[J]. Systems Engineering and Electronics, 2020, 42(2): 414-419(in Chinese). doi: 10.3969/j.issn.1001-506X.2020.02.21
[12]	李庆波, 李芳, 董瑞星, 等. 利用强化学习开展比例导引律的导航比设计[J]. 兵工学报, 2022, 43(12): 3040-3047. doi: 10.12382/bgxb.2021.0631 LI Q B, LI F, DONG R X, et al. Navigation ratio design of proportional navigation law using reinforcement learning[J]. Acta Armamentarii, 2022, 43(12): 3040-3047(in Chinese). doi: 10.12382/bgxb.2021.0631
[13]	邱潇颀, 高长生, 荆武兴. 拦截大气层内机动目标的深度强化学习制导律[J]. 宇航学报, 2022, 43(5): 685-695. doi: 10.3873/j.issn.1000-1328.2022.05.013 QIU X Q, GAO C S, JING W X. Deep reinforcement learning guidance law for intercepting endo-atmospheric maneuvering targets[J]. Journal of Astronautics, 2022, 43(5): 685-695(in Chinese). doi: 10.3873/j.issn.1000-1328.2022.05.013
[14]	CHEN W X, GAO C S, JING W X. Proximal policy optimization guidance algorithm for intercepting near-space maneuvering targets[J]. Aerospace Science and Technology, 2023, 132: 108031. doi: 10.1016/j.ast.2022.108031
[15]	HE S M, SHIN H S, TSOURDOS A. Computational missile guidance: a deep reinforcement learning approach[J]. Journal of Aerospace Information Systems, 2021, 18(8): 571-582. doi: 10.2514/1.I010970
[16]	GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets[J]. Aerospace Science and Technology, 2020, 99: 105746. doi: 10.1016/j.ast.2020.105746
[17]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018, 80: 1861-1870.
[18]	HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[EB/OL]. (2019-01-29)[2023-06-01]. http://arxiv.org/abs/1812.05905v2.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(11) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views(647) PDF downloads(26)

Deep reinforcement learning intelligent guidance for intercepting high maneuvering targets

doi: 10.13700/j.bh.1001-5965.2023.0375

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Deep reinforcement learning intelligent guidance for intercepting high maneuvering targets

doi: 10.13700/j.bh.1001-5965.2023.0375

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content