UAV reinforcement learning control algorithm with demonstrations

SUN Dan; GAO Dong; ZHENG Jianhua; HAN Peng

doi:10.13700/j.bh.1001-5965.2021.0466

Volume 49 Issue 6

Jun. 2023

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2023 > 49(6): 1424-1433.

SUN D，GAO D，ZHENG J H，et al. UAV reinforcement learning control algorithm with demonstrations[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（6）：1424-1433 （in Chinese） doi: 10.13700/j.bh.1001-5965.2021.0466

Citation:

PDF( 1867 KB)

UAV reinforcement learning control algorithm with demonstrations

doi: 10.13700/j.bh.1001-5965.2021.0466

SUN Dan^{1, 2},
GAO Dong^{1, 2
,
,},
ZHENG Jianhua^{1, 2},
HAN Peng¹

1.
National Space Science Center，Chinese Academy of Sciences，Beijing 100190，China
2.
University of Chinese Academy of Sciences，Beijing 100049，China

Funds:

Beijing Municipal Science and Technology Project (Z191100004319004)

More Information

Corresponding author: E-mail：gaodong@nssc.ac.cn
Received Date: 16 Aug 2021
Accepted Date: 14 Nov 2021
Publish Date: 30 Nov 2021

Abstract

Abstract

The practical application of reinforcement learning (RL) in an unmanned aerial vehicle control is restricted by low learning efficiency. An algorithm integrating RL with imitation learning was proposed to improve the performance of autonomous flight control systems. By establishing new loss and value functions, demonstrations were included as supervisory signals to actor and critic networks updating. Two replay buffers were utilized to store demonstration data and the data generated by interacting with the environment respectively. The prioritized experience replay system enhances the use of high-quality data and may assess the ratio of experience data utilization while learning. Simulation results showed that the RL control algorithm with demonstrations quickly obtained high rewards in the early stage of training and it had higher rewards during the whole training process than the conventional RL algorithm. The control strategy obtained by the proposed algorithm had faster response speed and higher control precision. Demonstrations enhance both the performance of the algorithm and the learning efficiency of the unmanned aerial vehicle autonomous control system, which makes it easier to learn more effective control techniques. The addition of demonstrations expands experience data, and increases the stability of the algorithm, making the unmanned aerial vehicle autonomous control system robust to the setting of the reward function.
- reinforcement learning,
- demonstrations,
- unmanned aerial vehicle,
- autonomous control,
- learning systems

FullText(HTML)

References(21)

References

[1]	SANTOSO F, GARRATT M A, ANAVATTI S G. State-of-the-art intelligent flight control systems in unmanned aerial vehicles[J]. IEEE Transactions on Automation Science and Engineering, 2018, 15(2): 613-627. doi: 10.1109/TASE.2017.2651109
[2]	FAUST A, PALUNKO I, CRUZ P, et al. Learning swing-free trajectories for UAVs with a suspended load[C]//2013 IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2013: 4902-4909.
[3]	ZHANG B C, MAO Z L, LIU W Q, et al. Geometric reinforcement learning for path planning of UAVs[J]. Journal of Intelligent & Robotic Systems, 2015, 77(2): 391-409.
[4]	KOCH W, MANCUSO R, WEST R, et al. Reinforcement learning for UAV attitude control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 1-21.
[5]	HWANGBO J, SA I, SIEGWART R, et al. Control of a quadrotor with reinforcement learning[J]. IEEE Robotics and Automation Letters, 2017, 2(4): 2096-2103. doi: 10.1109/LRA.2017.2720851
[6]	PHAM H X, LA H M, FEIL-SEIFER D, et al. Reinforcement learning for autonomous UAV navigation using function approximation[C]//2018 IEEE International Symposium on Safety, Security, and Rescue Robotics. Piscataway: IEEE Press, 2018: 1-6.
[7]	WANG D W, FAN T X, HAN T, et al. A two-stage reinforcement learning approach for multi-UAV collision avoidance under imperfect sensing[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 3098-3105. doi: 10.1109/LRA.2020.2974648
[8]	ZENG Y, XU X L, JIN S, et al. Simultaneous navigation and radio mapping for cellular-connected UAV with deep reinforcement learning[J]. IEEE Transactions on Wireless Communications, 2021, 20(7): 4205-4220. doi: 10.1109/TWC.2021.3056573
[9]	EBRAHIMI D, SHARAFEDDINE S, HO P H, et al. Autonomous UAV trajectory for localizing ground objects: A reinforcement learning approach[J]. IEEE Transactions on Mobile Computing, 2021, 20(4): 1312-1324. doi: 10.1109/TMC.2020.2966989
[10]	ESCANDELL-MONTERO P, LORENTE D, MARTÍNEZ-MARTÍNEZ J M, et al. Online fitted policy iteration based on extreme learning machines[J]. Knowledge-Based Systems, 2016, 100: 200-211. doi: 10.1016/j.knosys.2016.03.007
[11]	SAUNDERS W, SASTRY G, STUHLMÜLLER A, et al. Trial without error: Towards safe reinforcement learning via human intervention[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2018: 2067–2069.
[12]	ABEL D, SALVATIER J, STUHLMÜLLER A, et al. Agent-agnostic human-in-the-loop reinforcement learning[C]//Proceeding of Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 1-13.
[13]	NACHUM O, GU S, LEE H, et al. Data-efficient hierarchical reinforcement learning[C]//Proceeding of Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2018: 1-17.
[14]	HORGAN D, QUAN J, BUDDEN D, et al. Distributed prioritized experience replay[C]//Proceeding of International Conference on Learning Representations, 2018: 1-19.
[15]	BOBTSOV A, GUIRIK A, BUDKO M, et al. Hybrid parallel neuro-controller for multirotor unmanned aerial vehicle[C]//2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops. Piscataway: IEEE Press, 2016: 1-4.
[16]	SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. Cambridge: MIT press, 2018: 50-55.
[17]	HOU Y N, LIU L F, WEI Q, et al. A novel DDPG method with prioritized experience replay[C]//2017 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE Press, 2017: 316-321.
[18]	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[C]//Proceeding of International Conference on Learning Representations, 2016: 1-21.
[19]	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proceeding of International Conference on Learning Representations, 2016: 1-14.
[20]	PLAPPERT M, HOUTHOOFT R, DHARIWAL P, et al. Parameter space noise for exploration[C]//Proceeding of International Conference on Learning Representations, 2018: 1-18.
[21]	FERNANDO H C T E, DE SILVA A T A, DE ZOYSA M D C, et al. Modelling, simulation and implementation of a quadrotor UAV[C]//2013 IEEE 8th International Conference on Industrial and Information Systems. Piscataway: IEEE Press, 2014: 207-212.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(11) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views(620) PDF downloads(50)

UAV reinforcement learning control algorithm with demonstrations

doi: 10.13700/j.bh.1001-5965.2021.0466

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

UAV reinforcement learning control algorithm with demonstrations

doi: 10.13700/j.bh.1001-5965.2021.0466

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content