留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于双路视觉Transformer的图像风格迁移

纪宗杏 贝佳 刘润泽 任桐炜

纪宗杏,贝佳,刘润泽,等. 基于双路视觉Transformer的图像风格迁移[J]. 北京亚洲成人在线一二三四五六区学报,2025,51(7):2488-2497 doi: 10.13700/j.bh.1001-5965.2023.0392
引用本文: 纪宗杏,贝佳,刘润泽,等. 基于双路视觉Transformer的图像风格迁移[J]. 北京亚洲成人在线一二三四五六区学报,2025,51(7):2488-2497 doi: 10.13700/j.bh.1001-5965.2023.0392
JI Z X,BEI J,LIU R Z,et al. Dual-channel vision Transformer-based image style transfer[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2488-2497 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0392
Citation: JI Z X,BEI J,LIU R Z,et al. Dual-channel vision Transformer-based image style transfer[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2488-2497 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0392

基于双路视觉Transformer的图像风格迁移

doi: 10.13700/j.bh.1001-5965.2023.0392
基金项目: 

国家自然科学基金(62072232);中央高校基本科研业务费专项资金(021714380026);软件新技术与产业化协同创新中心

详细信息
    通讯作者:

    E-mail:beijia@nju.edu.cn

  • 中图分类号: TP301

Dual-channel vision Transformer-based image style transfer

Funds: 

National Science Foundation of China (62072232);The Fundamental Research Funds for the Central Universities (021714380026);The Collaborative Innovation Center of Novel Software Technology and Industrialization

More Information
  • 摘要:

    图像风格迁移旨在根据风格图像调整内容图像的视觉属性,使其保留原始内容的同时呈现出特定风格样式,从而生成具有视觉吸引力的风格化图像。针对现有代表性方法大多未考虑不同图像域间的编码差异,专注提取图像局部特征而忽视了全局上下文信息的重要性,提出一种新型的基于双路视觉Transformer的图像风格迁移方法Bi-Trans,对内容图像域和风格图像域进行独立编码,提取风格参数向量以离散化表征图像风格,通过交叉注意力机制与条件实例归一化(CIN)将内容图像标定至目标域风格,从而生成风格化图像。实验结果表明,该方法无论是内容保留度还是风格还原度均优于现有方法。

     

  • 图 1  本文方法与代表性方法的风格化效果对比

    Figure 1.  Comparison of stylization results between proposed method and representative methods

    图 2  基于双路视觉Transformer的图像风格迁移方法流程图

    Figure 2.  Flowchart of image style transfer method based on dual-channel vision Transformer

    图 3  视觉Transformer编解码架构图

    Figure 3.  Encoder-decoder architecture of vision Transformer

    图 4  本文方法任意风格迁移效果示例

    Figure 4.  Examples of arbitrary style transfer using proposed method

    图 5  本文方法与现有方法的风格迁移效果对比

    Figure 5.  Comparison of style transfer results between proposed method and existing methods

    图 6  不同消融设置下的风格迁移效果对比

    Figure 6.  Comparison of style transfer results under different ablation settings

    表  1  本文方法与基于CNN的方法平均风格化损失对比

    Table  1.   Comparison of average stylization loss between proposed method and CNN-based methods

    方法 $ \mathscr{L}_{\text {con}}$ $ \begin{array}{l}\mathscr{L}_{\text {sty}}^{\mu, \sigma }\end{array} $ $\mathscr{L}^{\text {Gram}}_{{\mathrm{sty}}} $
    Ghiasi[10] 0.94 2.81 12.51
    AdaIN[11] 0.91 1.39 11.63
    SANet[13] 0.97 2.88 16.18
    本文方法 0.69 1.12 7.31
    下载: 导出CSV

    表  2  本文方法与基于视觉Transformer的方法平均风格化损失对比

    Table  2.   Comparison of average stylization loss between proposed method and vision Transformer based methods

    方法 $ \mathscr{L}_{\text {con}}$ $ \mathscr{L}_{\text {sty}}^{\mu, \sigma}$ $\mathscr{L}_{\text {sty}}^{{\mathrm{Gram}}} $
    StyTr2[16] 0.79 1.35 9.86
    S2WAT[17] 0.93 2.64 13.47
    STTR[18] 0.79 3.86 27.58
    本文方法 0.69 1.12 7.31
    下载: 导出CSV

    表  3  不同消融设置下的平均风格化损失对比

    Table  3.   Comparison of average stylization loss under different ablation settings

    方法 $ \mathscr{L}_{\text {con}} $ $ \mathscr{L}_{\text {sty}}^{\mu, \sigma}$ $\mathscr{L}_{\text {sty}}^{{\mathrm{Gram}}} $
    消融(a) 0.70 1.02 3.15
    消融(b) 0.64 1.37 4.71
    消融(c) 1.88 11.36 18.61
    消融(d) 1.93 6.75 16.05
    完整方法 0.69 0.94 3.02
    下载: 导出CSV
  • [1] ZHANG Y, HUANG N, TANG F, et al. Inversion-based style transfer with diffusion models[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 10146-10156.
    [2] ZHANG Y X, DONG W M, TANG F, et al. ProSpect: prompt spectrum for attribute-aware personalization of diffusion models[J]. ACM Transactions on Graphics, 2023, 42(6): 1-14.
    [3] WANG Z, ZHAO L, XING W. Stylediffusion: controllable disentangled style transfer via diffusion models[C]//IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2023: 7677-7689.
    [4] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 10674-10685.
    [5] JING Y C, YANG Y Z, FENG Z L, et al. Neural style transfer: a review[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(11): 3365-3385. doi: 10.1109/TVCG.2019.2921336
    [6] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural network [C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2414-2423.
    [7] JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[C]//Computer Vision-ECCV . Cham: Springer, 2016: 694-711.
    [8] ULYANOV D, LEBEDEV V, VEDALDI A, et al. Texture networks: feed-forward synthesis of textures and stylized images[EB/OL]. (2016-03-10)[2023-01-10]. http://arxiv.org/abs/1603.03417v1.
    [9] LIN T W, MA Z Q, LI F, et al. Drafting and revision: Laplacian pyramid network for fast high-quality artistic style transfer[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 5137-5146.
    [10] GHIASI G, LEE H, KUDLUR M, et al. Exploring the structure of a real-time arbitrary neural artistic stylization network[C]//British Machine Vision Conference. Great Britain: BMVA, 2017: 1-27.
    [11] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1510-1519.
    [12] LI Y, FANG C, YANG J, et al. Universal style transfer via feature transforms[C]//Annual Conference on Neural Information Processing Systems. La Jolla: NIPS, 2017: 1-11.
    [13] PARK D Y, LEE K H. Arbitrary style transfer with style-attentional networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2019: 5873-5881.
    [14] LIU S H, LIN T W, HE D L, et al. AdaAttN: revisit attention mechanism in arbitrary neural style transfer[C]//IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 6629-6638.
    [15] CHANDRAN P, ZOSS G, GOTARDO P, et al. Adaptive convolutions for structure-aware style transfer [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 7968-7977.
    [16] DENG Y Y, TANG F, DONG W M, et al. StyTr2: image style transfer with transformers[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 11316-11326.
    [17] ZHANG C, YANG J, WANG L, et al. S2WAT: image style transfer via hierarchical vision transformer using Strips Window Attention[EB/OL]. (2022-11-07)[2023-06-19]. http://arxiv.org/abs/2210.12381.
    [18] WANG J B, YANG H, FU J L, et al. Fine-grained image style transfer with visual transformers[C]//Computer Vision-ACCV. Cham: Springer, 2023: 427-443.
    [19] ZHANG C Y, DAI Z Y, CAO P, et al. Edge enhanced image style transfer via transformers[C]//Proceedings of the ACM International Conference on Multimedia Retrieval. New York: ACM, 2023: 105-114.
    [20] FENG J X, ZHANG G, LI X H, et al. A compositional transformer based autoencoder for image style transfer[J]. Electronics, 2023, 12(5): 1184. doi: 10.3390/electronics12051184
    [21] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. doi: 10.1109/5.726791
    [22] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
    [23] GEIHOS R, RUBISCH P, MICHALIS C, et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness[C]//Proceedings of the International Conference on Learning Representations. Washington DC: ICLR, 2019: 1-22.
    [24] WEI H P, DENG Y Y, TANG F, et al. A comparative study of CNN- and transformer-based visual style transfer[J]. Journal of Computer Science and Technology, 2022, 37(3): 601-614. doi: 10.1007/s11390-022-2140-7
    [25] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. (2021-06-03)[2023-06-19]. http://arxiv.org/abs/2010.11929.
    [26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. La Jolla: NIPS, 2017: 5998-6008.
    [27] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Computer Vision-ECCV. Cham: Springer, 2014: 740-755.
    [28] PHILLIPS F, MACKINTOSH B. Wiki art gallery, inc. a case for critical thinking[J]. Issues in Accounting Education, 2011, 26(3): 593-608. doi: 10.2308/iace-50038
  • 加载中
图(6) / 表(3)
计量
  • 文章访问数:  390
  • HTML全文浏览量:  85
  • PDF下载量:  33
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-06-19
  • 录用日期:  2024-01-19
  • 网络出版日期:  2024-03-09
  • 整期出版日期:  2025-07-31

目录

    /

    返回文章
    返回
    常见问答