Visual layout plays a critical role in graphic design fields such as advertising, posters, and web UI design. The recent trend toward content-aware layout generation through generative models has shown promise, yet it often overlooks the semantic intricacies of layout design by treating it as a simple numerical optimization.
To bridge this gap, we introduce PosterLlama, a network designed for generating visually and textually coherent layouts by reformatting layout elements into HTML code and leveraging the rich design knowledge within language models. Furthermore, we enhance the robustness of our model with a unique depth-based poster augmentation strategy. This ensures our generated layouts remain semantically rich but also visually appealing, even with limited data.
Our extensive evaluations across several benchmarks demonstrate that PosterLlama outperforms existing methods in producing authentic and content-aware layouts. It supports an unparalleled range of conditions, including but not limited to content-aware layout generation, element conditional layout generation, and layout completion, among others, serving as a highly versatile user manipulation tool.
PosterLlama overview. PosterLlama is a vision-language model-based high-quality layout generation model. It takes a canvas image and a text prompt as input and generates a plausible layout. Our training process consists of two stages. The first stage involves multi-modal alignment training on a large VQA dataset. The second stage focuses on tuning for the layout generation task. To leverage the language model's knowledge of graphic design, we convert the layout into a code language format.
Visual Comparision with three baselines, which are DS-GAN, Layout Prompter, RADM. PosterLlama generate high-quality layout comparing another method. We additionally discussed about the information leckage problem of baselines in our Paper
Our model is capable of achieving all types of user-conditioned generation defining the condition as text format, which is also a key feature of our method. We visualize the seven conditions. The conditions are as follows: Gen-I involves image-conditioned layout generation, while Gen-IT and Gen-ITS additionally incorporate conditions based on category type and size. The completion task aims to generate a complete layout using partially placed elements. The recovery task involves restoring a randomly masked layout, with up to 80% of elements being masked and so on. As observed, PosterLlama is adept at handling a variety of user constraints while producing high-quality layouts.
(Top) To address the data scarcity issue in PosterLayout generation, we propose image space augmentation using readily available image generation models. By utilizing depth maps and refined captions, we ensure high-quality image augmentation that preserves the salient objects in the poster. Additionally, we employ Top-k similarity sampling to minimize out-of-distribution examples. (Bottom) The effect of augmentation technique. Most metrics shows performance improvements and the reasonable discussion about small degradation in some metrics is presented in our paper. Additionally, our augmentation can be used for inpainting artifacts free evaluation.
In Content-Aware Layout Generation, public datasets often lack clean backgrounds, requiring us to use inpainting techniques. However, as shown in the image above, this can introduce artifacts and lead to information leakage. Our augmentation method effectively reduces these artifacts. To ensure leakage-free evaluation, we assessed the model on augmented samples, demonstrating superior performance across various metrics.
@article{seol2024posterllama,
title = {PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation},
author = {Seol, Jaejung and Kim, Seojun and Yoo, Jaejun},
journal = {ECCV},
year = {2024}
}