DreamBench++

A Human-Aligned Benchmark for Personalized Image Generation

*Equal contribution.

1 
2 
3 
4 
5 
DreamBench++

Overview of DreamBench++. We collect diverse images and prompts, and utilize GPT-4o for automated evaluation aligned with human preference.

Abstract

Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advanced multimodal GPT models. Specifically, we systematically design the prompts to let GPT be both human-aligned and self-aligned, empowered with task reinforcement. Further, we construct a comprehensive dataset comprising diverse images and prompts. By benchmarking 7 modern generative models, we demonstrate that DreamBench++ results in significantly more human-aligned evaluation, helping boost the community with innovative findings.

Diverse Datasets

DreamBench++

Data distribution visualization. (a) Images comparison between DreamBench and DreamBench++ using t-SNE. (b) Image and prompt distribution of DreamBench++.

Human-Aligned Benchmarking

DreamBench++

Overall procedure of prompting GPT-4o for automated evaluation. The evaluation instructions are meta-prompting information written by humans, including task description, scoring criteria, scoring range, and format specification. Then, GPT-4o is prompted with reasoning instructions to perform internal thinking that provides a self-aligned task summary and planning. Finally, all prompts and reasoning outputs are joined with image samples for score outputs.

Leaderboard

Method T2I Model Concept Preservation Prompt Following CP·PF
Animal Human Object Style Overall Photorealistic Style Imaginative Overall
DreamBooth LoRA SDXL v1.0 0.751 0.311 0.543 0.718 0.598 0.898 0.895 0.754 0.865 0.517
IP-Adapter ViT-G SDXL v1.0 0.667 0.558 0.504 0.752 0.593 0.743 0.632 0.446 0.640 0.380
Emu2 SDXL v1.0 0.670 0.546 0.447 0.454 0.528 0.732 0.719 0.560 0.690 0.364
DreamBooth SD v1.5 0.640 0.199 0.488 0.476 0.494 0.789 0.775 0.504 0.721 0.356
IP-Adapter-Plus ViT-H SDXL v1.0 0.900 0.845 0.759 0.912 0.833 0.502 0.384 0.279 0.413 0.344
BLIP-Diffusion SD v1.5 0.673 0.557 0.469 0.507 0.547 0.581 0.510 0.303 0.495 0.271
Textual Inversion SD v1.5 0.502 0.358 0.305 0.358 0.378 0.671 0.686 0.437 0.624 0.236

Quality Result



A photograph of a heron in mid-flight over a misty river

Image 0

A cartoon style illustration of a corgi dressed as a superhero

Image 1

A photograph of a man playing violin in a dimly lit room

Image 2

A photo of a girl playing with a puppy on a grassy field

Image 3

A watercolor painting of a teddy bear dressed as a knight, guarding a castle

Image 4

A photograph of a guitar hanging on a brick wall adorned with vintage posters

Image 5

A detailed photograph of a teapot in a sunlit garden, flowers blooming around

Image 6

A serene riverside scene with fishermen in boats, painted in watercolor sketch

Image 7

A digital illustration of a cat, composed of low-poly geometric patterns

Image 8










Image 0 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8
Image 0 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8
Image 0 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8
Image 0 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8
Image 0 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8
Image 0 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8
Image 0 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8






Evaluation Results

Method T2I Model Concept Preservation Prompt Following
Human GPT DINO-I CLIP-I Human GPT CLIP-T
Textual Inversion SD v1.5 0.316 0.378±0.0012 0.437 0.726 0.604 0.624±0.0033 0.302
DreamBooth SD v1.5 0.453 0.493±0.0012 0.544 0.753 0.679 0.721±0.0016 0.323
DreamBooth LoRA SDXL v1.0 0.571 0.597±0.0007 0.628 0.784 0.821 0.865±0.0007 0.341
BLIP-Diffusion SD v1.5 0.513 0.547±0.0010 0.649 0.823 0.577 0.495±0.0005 0.286
Emu2 SDXL v1.0 0.410 0.528±0.0016 0.539 0.763 0.641 0.689±0.0010 0.310
IP-Adapter-Plus ViT-H SDXL v1.0 0.755 0.833±0.0008 0.834 0.917 0.541 0.413±0.0005 0.282
IP-Adapter ViT-G SDXL v1.0 0.570 0.593±0.0018 0.667 0.855 0.688 0.640±0.0017 0.309

Human Alignment Degree

Method T2I Model Concept Preservation Kd Prompt Following Kd
H-H G-H D-H C-H H-H G-H C-H
Textual Inversion SD v1.5 0.685 0.544±0.014 0.262 -0.030 0.475 0.461±0.007 0.267
DreamBooth SD v1.5 0.647 0.596±0.003 0.408 0.229 0.516 0.506±0.002 0.185
DreamBooth LoRA SDXL v1.0 0.656 0.641±0.007 0.371 0.321 0.469 0.402±0.001 0.022
BLIP-Diffusion SD v1.5 0.613 0.362±0.017 -0.078 -0.186 0.619 0.541±0.003 0.319
Emu2 SDXL v1.0 0.746 0.669±0.005 0.518 0.258 0.441 0.422±0.011 0.230
IP-Adapter-Plus ViT-H SDXL v1.0 0.602 0.366±0.017 -0.141 -0.150 0.576 0.484±0.006 0.256
IP-Adapter ViT-G SDXL v1.0 0.591 0.458±0.002 -0.073 -0.212 0.509 0.531±0.006 0.196

BibTeX

@article{peng2024dreambench,
  author={Yuang Peng and Yuxin Cui and Haomiao Tang and Zekun Qi and Runpei Dong and Jing Bai and Chunrui Han and Zheng Ge and Xiangyu Zhang and Shu-Tao Xia},
  title={DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation},
  journal={CoRR},
  volume={abs/2406.16855},
  year={2024},
}