WritingOpenAIOpenAIpublished Jul 9, 2018seen 6d

Glow: Better reversible generative models

Open original ↗

Captured source

source ↗
published Jul 9, 2018seen 6dcaptured 2dhttp 200method exa

Glow: Better reversible generative models | OpenAI

July 9, 2018

Milestone

Glow: Better reversible generative models

Read paper

Loading…

Share

We introduce Glow, a reversible generative model which uses invertible 1x1 convolutions. It extends previous⁠ work⁠ on reversible generative models and simplifies the architecture. Our model can generate realistic high resolution images, supports efficient sampling, and discovers features that can be used to manipulate attributes of data. We’re releasing code for the model and an online visualization tool so people can explore and build on these results.

Loading...

Motivation

Loading...

Generative modeling is about observing data, like a set of pictures of faces, then learning a model of how this data was generated. Learning to approximate the data-generating process requires learning all structure present in the data, and successful models should be able to synthesize outputs that look similar to the data. Accurate generative models have broad applications, including speech synthesis⁠, text analysis and synthesis⁠, semi-supervised learning⁠ and model-based control⁠. The technique we propose can be applied to those problems as well.

Glow is a type of reversible generative model, also called flow-based generative model, and is an extension of the NICE⁠ and RealNVP⁠ techniques. Flow-based generative models have so far gained little attention in the research community compared to GANs⁠ and VAEs⁠.

Some of the merits of flow-based generative models include:

  • Significant potential for memory savings. Computing gradients in reversible neural networks requires an amount of memory that is constant instead of linear in their depth, as explained in the RevNet paper⁠.
  • Useful latent space for downstream tasks. The hidden layers of autoregressive models have unknown marginal distributions, making it much more difficult to perform valid manipulation of data. In GANs, datapoints can usually not be directly represented in a latent space, as they have no encoder and might not have full support over the data distribution. This is not the case for reversible generative models and VAEs, which allow for various applications such as interpolations between datapoints and meaningful modifications of existing datapoints.
  • Efficient inference and efficient synthesis. Autoregressive models, such the PixelCNN⁠, are also reversible, however synthesis from such models is difficult to parallelize, and typically inefficient on parallel hardware. Flow-based generative models like Glow (and RealNVP) are efficient to parallelize for both inference and synthesis.
  • Exact latent-variable inference and log-likelihood evaluation. In VAEs, one is able to infer only approximately the value of the latent variables that correspond to a datapoint. GAN’s have no encoder at all to infer the latents. In reversible generative models, this can be done exactly without approximation. Not only does this lead to accurate inference, it also enables optimization of the exact log-likelihood of the data, instead of a lower bound of it.

Results

Using our techniques we achieve significant improvements on standard benchmarks compared to RealNVP, the previous best published result with flow-based generative models.

Dataset

RealNVP

Glow

CIFAR-10

3.49

3.55

Imagenet 32x32

4.28

4.09

Imagenet 64x64

3.98

3.81

LSUN (bedroom)

2.72

2.38

LSUN (tower)

2.81

2.46

LSUN (church outdoor)

3.08

2.67

Quantitative performance in terms of bits per dimension evaluated on the test set of various datasets, for the RealNVP model⁠ versus our Glow model.*

Loading...

Glow models can generate realistic-looking high-resolution images, and can do so efficiently. Our model takes about 130ms to generate a 256 x 256 sample on a NVIDIA 1080 Ti GPU. Like previous⁠ work, we found that sampling from a reduced-temperature model often results in higher-quality samples. The samples above were obtained by scaling the standard deviation of the latents by a temperature of 0.7.

Interpolation in latent space

We can also interpolate between arbitrary faces, by using the encoder to encode the two images and sample from intermediate points. Note that the inputs are arbitrary faces and not samples from the model, thus providing evidence that the model has support over the full target distribution.

Loading...

Manipulation in latent space

We can train a flow-based model, without labels, and then use the learned latent reprentation for downstream tasks like manipulating attributes of your input. These semantic attributes could be the color of hair in a face, the style of an image, the pitch of a musical sound, or the emotion of a text sentence. Since flow-based models have a perfect encoder, you can encode inputs and compute the average latent vector of inputs with and without the attribute. The vector direction between the two can then be used to manipulate an arbitrary input towards that attribute.

The above process requires a relatively small amount of labeled data, and can be done after the model has been trained (no labels are needed while training). Previous⁠ work⁠ using GAN’s requires training an encoder separately. Approaches⁠ using⁠ VAE’s only guarantee that the decoder and encoder are compatible for in-distribution data. Other approaches involve directly learning the function representing the transformation, like Cycle-GAN’s⁠, however they require retraining for every transformation.

Plain Text

1# Train flow model on large, unlabelled dataset X2m = train(X_unlabelled)34# Split labelled dataset based on attribute, say blonde hair5X_positive, X_negative = split(X_labelled)67# Obtain average encodings of positive and negative inputs8z_positive = average([m.encode(x) for x in X_positive])9z_negative = average([m.encode(x) for x in X_negative])1011# Get manipulation vector by taking difference12z_manipulate = z_positive - z_negative1314# Manipulate new x_input along z_manipulate, by a scalar alpha \in [-1,1]15z_input = m.encode(x_input)16x_manipulated = m.decode(z_input + alpha * z_manipulate)

Simple code snippet for using a flow-based model for manipulating attributes

Contribution

Our main contribution and also our departure from the earlier RealNVP work is the addition of a reversible 1x1 convolution, as well as removing other components, simplifying the architecture overall.

The RealNVP architecture consists of sequences of two types of layers: layers with…

Excerpt shown — open the source for the full document.

Notability

Scored, but no written rationale attached yet.

OpenAI has a writing signal matching data demand, product and customer.