Method | T2I Model | Concept Preservation | Prompt Following | CP·PF | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Animal | Human | Object | Style | Overall | Photorealistic | Style | Imaginative | Overall | |||
DreamBooth LoRA | SDXL v1.0 | 0.751 | 0.311 | 0.543 | 0.718 | 0.598 | 0.898 | 0.895 | 0.754 | 0.865 | 0.517 |
IP-Adapter ViT-G | SDXL v1.0 | 0.667 | 0.558 | 0.504 | 0.752 | 0.593 | 0.743 | 0.632 | 0.446 | 0.640 | 0.380 |
Emu2 | SDXL v1.0 | 0.670 | 0.546 | 0.447 | 0.454 | 0.528 | 0.732 | 0.719 | 0.560 | 0.690 | 0.364 |
DreamBooth | SD v1.5 | 0.640 | 0.199 | 0.488 | 0.476 | 0.494 | 0.789 | 0.775 | 0.504 | 0.721 | 0.356 |
IP-Adapter-Plus ViT-H | SDXL v1.0 | 0.900 | 0.845 | 0.759 | 0.912 | 0.833 | 0.502 | 0.384 | 0.279 | 0.413 | 0.344 |
BLIP-Diffusion | SD v1.5 | 0.673 | 0.557 | 0.469 | 0.507 | 0.547 | 0.581 | 0.510 | 0.303 | 0.495 | 0.271 |
Textual Inversion | SD v1.5 | 0.502 | 0.358 | 0.305 | 0.358 | 0.378 | 0.671 | 0.686 | 0.437 | 0.624 | 0.236 |
Method | T2I Model | Concept Preservation | Prompt Following | |||||
---|---|---|---|---|---|---|---|---|
Human | GPT | DINO-I | CLIP-I | Human | GPT | CLIP-T | ||
Textual Inversion | SD v1.5 | 0.316 | 0.378±0.0012 | 0.437 | 0.726 | 0.604 | 0.624±0.0033 | 0.302 |
DreamBooth | SD v1.5 | 0.453 | 0.493±0.0012 | 0.544 | 0.753 | 0.679 | 0.721±0.0016 | 0.323 |
DreamBooth LoRA | SDXL v1.0 | 0.571 | 0.597±0.0007 | 0.628 | 0.784 | 0.821 | 0.865±0.0007 | 0.341 |
BLIP-Diffusion | SD v1.5 | 0.513 | 0.547±0.0010 | 0.649 | 0.823 | 0.577 | 0.495±0.0005 | 0.286 |
Emu2 | SDXL v1.0 | 0.410 | 0.528±0.0016 | 0.539 | 0.763 | 0.641 | 0.689±0.0010 | 0.310 |
IP-Adapter-Plus ViT-H | SDXL v1.0 | 0.755 | 0.833±0.0008 | 0.834 | 0.917 | 0.541 | 0.413±0.0005 | 0.282 |
IP-Adapter ViT-G | SDXL v1.0 | 0.570 | 0.593±0.0018 | 0.667 | 0.855 | 0.688 | 0.640±0.0017 | 0.309 |
Method | T2I Model | Concept Preservation KdŌ | Prompt Following KdŌ | |||||
---|---|---|---|---|---|---|---|---|
H-H | G-H | D-H | C-H | H-H | G-H | C-H | ||
Textual Inversion | SD v1.5 | 0.685 | 0.544±0.014 | 0.262 | -0.030 | 0.475 | 0.461±0.007 | 0.267 |
DreamBooth | SD v1.5 | 0.647 | 0.596±0.003 | 0.408 | 0.229 | 0.516 | 0.506±0.002 | 0.185 |
DreamBooth LoRA | SDXL v1.0 | 0.656 | 0.641±0.007 | 0.371 | 0.321 | 0.469 | 0.402±0.001 | 0.022 |
BLIP-Diffusion | SD v1.5 | 0.613 | 0.362±0.017 | -0.078 | -0.186 | 0.619 | 0.541±0.003 | 0.319 |
Emu2 | SDXL v1.0 | 0.746 | 0.669±0.005 | 0.518 | 0.258 | 0.441 | 0.422±0.011 | 0.230 |
IP-Adapter-Plus ViT-H | SDXL v1.0 | 0.602 | 0.366±0.017 | -0.141 | -0.150 | 0.576 | 0.484±0.006 | 0.256 |
IP-Adapter ViT-G | SDXL v1.0 | 0.591 | 0.458±0.002 | -0.073 | -0.212 | 0.509 | 0.531±0.006 | 0.196 |
@article{peng2024dreambench,
author={Yuang Peng and Yuxin Cui and Haomiao Tang and Zekun Qi and Runpei Dong and Jing Bai and Chunrui Han and Zheng Ge and Xiangyu Zhang and Shu-Tao Xia},
title={DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation},
journal={CoRR},
volume={abs/2406.16855},
year={2024},
}