ModelOpenAIOpenAIpublished Jul 5, 2023seen 5d

openai/diffusers-cd_imagenet64_l2

Open original ↗

Captured source

source ↗
published Jul 5, 2023seen 5dcaptured 9hhttp 200method plainlicense mitlibrary diffusersdownloads 29likes 7

Disclaimer: This model was added by the amazing community contributors dg845 and ayushtues❤️

Consistency models are a new class of generative models introduced in "Consistency Models" (paper, code) by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. From the paper abstract:

> Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64 x 64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64 x 64 and LSUN 256 x 256.

Intuitively, a consistency model can be thought of as a model which, when evaluated on a noisy image and timestep, returns an output image sample similar to that which would be returned by running a sampling algorithm on a diffusion model. Consistency models can be parameterized by any neural network whose input has the same dimensionality as its output, such as a U-Net.

More precisely, given a teacher diffusion model and fixed sampler, we can train ("distill") a consistency model such that when it is given a noisy image and its corresponding timestep, the output sample of the consistency model will be close to the output that would result by using the sampler on the diffusion model to produce a sample, starting at the same noisy image and timestep. The authors call this procedure "consistency distillation (CD)". Consistency models can also be trained from scratch to generate clean images from a noisy image and timestep, which the authors call "consistency training (CT)".

This model is a diffusers-compatible version of the cd_imagenet64_l2.pt checkpont from the original code and model release. This model was distilled (via consistency distillation (CD)) from an EDM model trained on the ImageNet 64x64 dataset, using the L2 distance#Euclidean_norm) as the measure of closeness. See the original model card for more information.

Download

The original PyTorch model checkpoint can be downloaded from the original code and model release.

The diffusers pipeline for the cd-imagenet64-l2 model can be downloaded as follows:

from diffusers import ConsistencyModelPipeline

pipe = ConsistencyModelPipeline.from_pretrained("openai/diffusers-cd_imagenet64_l2")

Usage

The original model checkpoint can be used with the original consistency models codebase.

Here is an example of using the cd-imagenet64-l2 checkpoint with diffusers:

import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("cd_imagenet64_l2_onestep_sample.png")

# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("cd_imagenet64_l2_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
image.save("cd_imagenet64_l2_multistep_sample_penguin.png")

Model Details

  • Model type: Consistency model unconditional image generation model, distilled from a diffusion model
  • Dataset: ImageNet 64x64
  • License: MIT
  • Model Description: This model performs unconditional image generation. Its main component is a U-Net, which parameterizes the consistency model. This model was distilled by the Consistency Model authors from an EDM diffusion model, also originally trained by the authors.
  • Resources for more information:: Paper, GitHub Repository, [Original Model Card](/openai/consistency_models/blob/main/model-card.md)

Datasets

_Note: This section is taken from the "Datasets" section of the original model card_.

The models that we are making available have been trained on the ILSVRC 2012 subset of ImageNet or on individual categories from LSUN. Here we outline the characteristics of these datasets that influence the behavior of the models:

ILSVRC 2012 subset of ImageNet: This dataset was curated in 2012 and has around a million pictures, each of which belongs to one of 1,000 categories. A significant number of the categories in this dataset are animals, plants, and other naturally occurring objects. Although…

Excerpt shown — open the source for the full document.