ConsistentID

ConsistentID:Portrait Generation with Multimodal Fine-Grained Identity Preserving

1. Shenzhen Campus of Sun Yat-sen University, 2. Zhuhai Campus of Sun Yat-sen University, 3. Lenovo Research, 4. Inception Institute of Artificial Intelligence

Abstract

Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.

Application

The comparisons of two downstream applications.

Application cases of ConsistentID for identity confusion.

Application cases of ConsistentID for bringing old photos back to life.

Application cases of ConsistentID for altering the age attribute of a character.

Application on SFHQ test set.

Comparation

Qualitative comparison of universal recontextualization samples is conducted, comparing our approach with other methods using five distinct identities and their corresponding prompts. Our ConsistentID exhibits a more powerful capability in high-quality generation, flexible editability, and strong identity fidelity.

Qualitative comparison of our model with other models on two special tasks: stylization and action instruction.

The comparisons with more fine-tuning-based models.

Comparison of ConsistentID with IP-Adapter and its face version variants conditioned on different styles.

Visualization in re-contextualization settings. These examples demonstrate the high-identity fidelity and text editing capability of ConsistentID.

Ablation experiment

ConsistentID:Portrait Generation with Multimodal Fine-Grained Identity Preserving

Given some images of input IDs, our ConsistentID can generate diverse personalized ID images based on text prompts using only a single image.

Abstract

Facial feature details

Comparison of facial feature details between our method and existing approaches.

Framework

The overall framework of our proposed ConsistentID.