Volume 49 Issue 8
Aug.  2023
Turn off MathJax
Article Contents
LI Y H,ZHU M Y,REN J,et al. Text-to-image synthesis based on modified deep convolutional generative adversarial network[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(8):1875-1883 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0588
Citation: LI Y H,ZHU M Y,REN J,et al. Text-to-image synthesis based on modified deep convolutional generative adversarial network[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(8):1875-1883 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0588

Text-to-image synthesis based on modified deep convolutional generative adversarial network

doi: 10.13700/j.bh.1001-5965.2021.0588
Funds:

National Natural Science Foundation of China (61902301); Key Project of Natural Science Basic Research Plan in Shaanxi Province of China (2022JZ-35) 

More Information
  • Corresponding author: E-mail:hitliyunhong@163.com
  • Received Date: 01 Oct 2021
  • Accepted Date: 24 Dec 2021
  • Publish Date: 07 Feb 2022
  • When high-dimensional texts are adopted as input,images generated by the previously proposed deep convolutional generative adversarial network (DCGAN) model usually suffer from distortions and structure degradation due to the sparsity of texts, which seriously poses a negative impact on the generative performance. To address this issue, an improved deep convolutional generative adversarial network model, CA-DCGAN is proposed. Technically, a deep convolutional network and a recurrent text encoder are simultaneously employed to encode the input text so that the corresponding text embedding representation can be obtained. Then, a conditional augmentation (CA) model is introduced to generate an additional condition variable to replace the original high-dimensional text feature. Finally, the conditional variable and random noise are combined as the input of the generator. Meanwhile, to avoid over-fitting and promote the convergence,we introduce a KL regularization term into the generator’s loss. Moreover, we adopt a spectral normalization (SN) layer in the discriminator to prevent the mode collapse caused by the unbalanced training due to the fast gradient descent of the discriminator. The experimental verification results show that the proposed model on the Oxford-102-flowers and CUB-200 datasets is better than that of alignDRAW, GAN-CLS, GAN-INT-CLS, StackGAN (64×64), StackGAN-vl (64×64) in terms of the quality of generated images. The results show that the lowest inception score increased by 10.9% and 5.6% respectively, the highest inception score increased by 41.4% and 37.5% respectively, while the lowest FID index value decreased by 11.4% and 8.4% respectively,the highest FID index value decreased by 43.9% and 42.5% respectively,which further validate the effectiveness of the proposed method.

     

  • loading
  • [1]
    ZHOU K Y, YANG Y X, HOSPEDALES T, et al. Deep domain-adversarial image generation for domain generalisation[C]//34th AAAI Conference on Artificial Intelligence/32nd Innovative Applications of Artificial Intelligence Conference/10th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI, 2020, 34: 13025-13032.
    [2]
    陆婷婷, 李潇, 张尧, 等. 基于三维点云模型的空间目标光学图像生成技术[J]. 北京亚洲成人在线一二三四五六区学报, 2020, 46(2): 274-286.

    LU T T, LI X, ZHANG Y, et al. A technology for generation of space object optical image based on 3D point cloud model[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(2): 274-286(in Chinese).
    [3]
    ZHANG Z, XIE Y, YANG L. Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6199-6208.
    [4]
    牛蒙蒙, 沈明瑞, 秦波, 等. 基于GAN的刀具状态监测数据集增强方法[J]. 组合机床与自动化加工技术, 2021(4): 113-115. doi: 10.13462/j.cnki.mmtamt.2021.04.027

    NIU M M, SHEN M R, QIN B, et al. A data augmentation method based on GAN in tool condition monitoring[J]. Combined Machine Tool and Automatic Machining Technology, 2021(4): 113-115(in Chinese). doi: 10.13462/j.cnki.mmtamt.2021.04.027
    [5]
    VENDROV I, KIROS R, FIDLER S, et al. Order-embeddings of images and language[EB/OL]. (2016-03-01)[2021-09-01].
    [6]
    MANSIMOV E, PARISOTTO E, BA J L, et al. Generating images from captions with attention[EB/OL]. (2016-02-29)[2021-09-01].
    [7]
    GREGOR K, DANIHELKA I, GRAVES A, et al. DRAW: A recurrent neural network for image generation[C]//Proceedings of the 32nd International Conference on Machine Learning. New York: ACM, 2015: 1462-1471.
    [8]
    REED S, VAN DEN OORD A, KALCHBRENNER N, et al. Generating interpretable images with controllable structure[C]//5th International Conference on Learning Representations, Appleton, WI: ICLR, 2016.
    [9]
    NGUYEN A, CLUNE J, BENGIO Y, et al. Plug & play generative networks: Conditional iterative generation of images in latent space[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 17355648.
    [10]
    GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680.
    [11]
    MIRZA M, OSINDERO S. Conditional generative adversarial nets[EB/OL]. (2014-10-06)[2021-09-01].
    [12]
    SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 2002, 45(11): 2673-2681.
    [13]
    RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[C]//4th International Conference on Learning Representations, Appleton, WI: ICLR, 2016.
    [14]
    REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning. New York: ACM, 2016: 1060-1069.
    [15]
    REED S, AKATA Z, LEE H, et al. Learning deep representations of fine-grained visual descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 49-58.
    [16]
    NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]//Proceedings of the IEEE Conference on Computer Vision, Graphics and Image Processing. Piscataway: IEEE Press, 2008: 722-729.
    [17]
    WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset: CNS-TR-2011-001[R]. Pasadena: California Institute of Technology, 2011.
    [18]
    SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[C]//30th Conference on Neural Information Processing Systems, Cambridge: MIT Press, 2016: 2234-2242.
    [19]
    HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 6626-6637.
    [20]
    ZHANG H, XU T, LI H, et al. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 5907-5915.
    [21]
    ZHANG H, XU T, LI H, et al. StackGAN++: Realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1947-1962. doi: 10.1109/TPAMI.2018.2856256
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(4)

    Article Metrics

    Article views(746) PDF downloads(118) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return