Abstract: Although text-to-image models excellently create realistic images from text, they struggle with long-form text due to token limits in the pretrained text encoder. In this paper, we propose ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results