PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns

  • 1FNii,CUHKSZ
  • 2SSE,CUHKSZ
  • 3Xiaobing.AI
  • 4Cardiff University

Abstract
In this paper, we propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable photorealistic synthesis of personalized composite clothing on input human images. Unlike prior arts constrained by specific input types, our method allows flexible specification of style (text or image) and texture (full garment, cropped sections, or texture patches) conditions. To address the entanglement challenge when using full garment images as conditions, we develop a two-stage pipeline with explicit disentanglement of style and texture. In the first stage, we generate a human parsing map reflecting the desired style conditioned on the input. In the second stage, we composite textures onto the parsing map areas based on the texture input. To represent complex and non-stationary textures that have never been achieved in previous fashion editing works, we first propose extracting hierarchical and balanced CLIP features and applying position encoding in VTON. Experiments demonstrate superior synthesis quality and personalization enabled by our method. The flexible control over style and texture mixing brings virtual try-on to a new level of user experience for online shopping and fashion design.
Video
Results

You can select the reference style and corresponding texture to customize the synthesized human images.


A wearing a with , and with .
Bibtex
 @misc{ning2023picture,
                title={PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns}, 
                author={Shuliang Ning and Duomin Wang and Yipeng Qin and Zirong Jin and Baoyuan Wang and Xiaoguang Han},
                year={2023},
                eprint={2312.04534},
                archivePrefix={arXiv},
                primaryClass={cs.CV}
              }
                    

We referred to the project page of Text2human when creating this project page.