| Citation: | WANG Y Z,Li X,MOU R,et al. Improved SEGAN based ATC speech enhancement algorithm for air traffic control[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(12):3930-3939 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0874 |
An enhanced version of the Speech Enhancement Generative Adversarial Network (SEGAN) algorithm used in air traffic control (ATC) is suggested in an effort to raise the standard of radiotelephony communication. Aiming at the problem that the traditional SEGAN is submerged under the condition of low signal-to-noise ratio, a multi-stage, multi-mapping, multi-dimensional output generation and multi-scale, multi-discriminator network models are proposed. First, the deep neural network structure is used to extract the speech semantic features, and the ATC speech semantic segmentation is finished. Secondly, set up multiple generators to further optimize the speech signal. Then, a down sampling module is added to the convolutional layer to improve the utilization of speech information by the model and reduce the loss of speech information. Finally, multi-scale, multiple discriminators are used to learn the distribution law and information of speech samples in multiple directions. According to the results, the improved SEGAN model's Short-Time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) are improved by 23.28% and 20.11%, respectively, under low signal-to-noise ratio conditions. This can effectively and swiftly improve ATC speech, making it a good option for follow-up. Provide preparatory work for subsequent Automatic Speech Recognition of ATC.
| [1] |
张军峰, 游录宝, 周铭, 等. 基于点融合系统的多目标进场排序与调度[J]. 北京亚洲成人在线一二三四五六区学报, 2023, 49(1): 66-73.
ZHANG J F, YOU L B, ZHOU M, et al. Multi-objective arrival sequencing and scheduling based on point merge system[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(1): 66-73 (in Chinese).
|
| [2] |
王钇翔. 面向民航空中管制语音指令的语音增强算法系统研究与应用[D]. 成都: 电子科技大学, 2022: 46-60.
WANG Y X. Research and application of voice enhancement algorithm system for civil aviation air traffic control voice command[D]. Chengdu: University of Electronic Science and Technology of China, 2022: 46-60(in Chinese).
|
| [3] |
中国民航局. 2018年中国航空安全年度报告 [R]. 北京: 中国民航局, 2019.
Civil Aviation Administration of China. Annual report on aviation safety in China, 2018 [R]. Beijing: Civil Aviation Administration of China, 2019(in Chinese).
|
| [4] |
周坤, 陈文杰, 陈伟海, 等. 基于三次样条插值的扩展谱减语音增强算法[J]. 北京亚洲成人在线一二三四五六区学报, 2023, 49(10): 2826-2834.
ZHOU K, CHEN W J, CHEN W H, et al. Spline subtraction speech enhancement based on cubic spline interpolation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(10): 2826-2834(in Chinese).
|
| [5] |
KARAM M, KHAZAAL H F, AGLAN H, et al. Noise removal in speech processing using spectral subtraction[J]. Journal of Signal and Information Processing, 2014, 5(2): 32-41. doi: 10.4236/jsip.2014.52006
|
| [6] |
CHEN J D, BENESTY J, HUANG Y T, et al. New insights into the noise reduction Wiener filter[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218-1234. doi: 10.1109/TSA.2005.860851
|
| [7] |
LIM J S, OPPENHEIM A V. Enhancement and bandwidth compression of noisy speech[J]. Proceedings of the IEEE, 1979, 67(12): 1586-1604. doi: 10.1109/PROC.1979.11540
|
| [8] |
孙琦. 基于子空间的低计算复杂度语音增强算法研究[D]. 长春: 吉林大学, 2017: 18-23.
SUN Q. Research on speech enhancement algorithm with low computational complexity based on subspace[D]. Changchun: Jilin University, 2017: 18-23 (in Chinese).
|
| [9] |
DENDRINOS M, BAKAMIDIS S, CARAYANNIS G. Speech enhancement from noise: A regenerative approach[J]. Speech Communication, 1991, 10(1): 45-57. doi: 10.1016/0167-6393(91)90027-Q
|
| [10] |
TUFTS D W, KUMARESAN R, KIRSTEINS I. Data adaptive signal estimation by singular value decomposition of a data matrix[J]. Proceedings of the IEEE, 1982, 70(6): 684-685. doi: 10.1109/PROC.1982.12367
|
| [11] |
LEE D D, SEUNG H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401: 788-791. doi: 10.1038/44565
|
| [12] |
娄迎曦, 袁文浩, 时云龙, 等. 融合注意力机制的QRNN语音增强方法[J]. 山东理工大学学报(自然科学版), 2022, 36(3): 7-12.
LOU Y X, YUAN W H, SHI Y L, et al. A speech enhancement method based on QRNN incorporating attention mechanism[J]. Journal of Shandong University of Technology (Natural Science Edition), 2022, 36(3): 7-12 (in Chinese).
|
| [13] |
SCHMIDHUBER J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61: 85-117. doi: 10.1016/j.neunet.2014.09.003
|
| [14] |
SCALART P, FILHO J V. Speech enhancement based on a priori signal to noise estimation[C]// 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings. Piscataway: IEEE Press, 1996: 629-632.
|
| [15] |
XU Y, DU J, DAI L R, et al. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters, 2014, 21(1): 65-68. doi: 10.1109/LSP.2013.2291240
|
| [16] |
KANG T G, KWON K, SHIN J W, et al. NMF-based speech enhancement incorporating deep neural network[C]//Interspeech 2014. Singapore: ISCA, 2014.
|
| [17] |
GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. doi: 10.1145/3422622
|
| [18] |
PASCUAL S, BONAFONTE A, SERRÀ J. SEGAN: Speech enhancement generative adversarial network[EB/OL]. (2017-06-09)[2022-10-30].
|
| [19] |
尹文兵, 高戈, 曾邦, 等. 基于时频域生成对抗网络的语音增强算法[J]. 计算机科学, 2022, 49(6): 187-192. doi: 10.11896/jsjkx.210500114
YIN W B, GAO G, ZENG B, et al. Speech enhancement based on time-frequency domain GAN[J]. Computer Science, 2022, 49(6): 187-192 (in Chinese). doi: 10.11896/jsjkx.210500114
|
| [20] |
李晓理, 张博, 王康, 等. 人工智能的发展及应用[J]. 北京工业大学学报, 2020, 46(6): 583-590.
LI X L, ZHANG B, WANG K, et al. Development and application of artificial intelligence[J]. Journal of Beijing University of Technology, 2020, 46(6): 583-590 (in Chinese).
|
| [21] |
QUAN T M, NGUYEN-DUC T, JEONG W K. Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss[J]. IEEE Transactions on Medical Imaging, 2018, 37(6): 1488-1497. doi: 10.1109/TMI.2018.2820120
|
| [22] |
PHAN H, MCLOUGHLIN I V, PHAM L, et al. Improving GANs for speech enhancement[J]. IEEE Signal Processing Letters, 2020, 27: 1700-1704. doi: 10.1109/LSP.2020.3025020
|
| [23] |
PANDEY A, WANG D L. On adversarial training and loss functions for speech enhancement[C]// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2018: 5414-5418.
|