Glow: Better reversible generative models
Captured source
source ↗Glow: Better reversible generative models | OpenAI
July 9, 2018
Milestone
Glow: Better reversible generative models
Read paper
Loading…
Share
We introduce Glow, a reversible generative model which uses invertible 1x1 convolutions. It extends previous work on reversible generative models and simplifies the architecture. Our model can generate realistic high resolution images, supports efficient sampling, and discovers features that can be used to manipulate attributes of data. We’re releasing code for the model and an online visualization tool so people can explore and build on these results.
Loading...
Motivation
Loading...
Generative modeling is about observing data, like a set of pictures of faces, then learning a model of how this data was generated. Learning to approximate the data-generating process requires learning all structure present in the data, and successful models should be able to synthesize outputs that look similar to the data. Accurate generative models have broad applications, including speech synthesis, text analysis and synthesis, semi-supervised learning and model-based control. The technique we propose can be applied to those problems as well.
Glow is a type of reversible generative model, also called flow-based generative model, and is an extension of the NICE and RealNVP techniques. Flow-based generative models have so far gained little attention in the research community compared to GANs and VAEs.
Some of the merits of flow-based generative models include:
- Significant potential for memory savings. Computing gradients in reversible neural networks requires an amount of memory that is constant instead of linear in their depth, as explained in the RevNet paper.
- Useful latent space for downstream tasks. The hidden layers of autoregressive models have unknown marginal distributions, making it much more difficult to perform valid manipulation of data. In GANs, datapoints can usually not be directly represented in a latent space, as they have no encoder and might not have full support over the data distribution. This is not the case for reversible generative models and VAEs, which allow for various applications such as interpolations between datapoints and meaningful modifications of existing datapoints.
- Efficient inference and efficient synthesis. Autoregressive models, such the PixelCNN, are also reversible, however synthesis from such models is difficult to parallelize, and typically inefficient on parallel hardware. Flow-based generative models like Glow (and RealNVP) are efficient to parallelize for both inference and synthesis.
- Exact latent-variable inference and log-likelihood evaluation. In VAEs, one is able to infer only approximately the value of the latent variables that correspond to a datapoint. GAN’s have no encoder at all to infer the latents. In reversible generative models, this can be done exactly without approximation. Not only does this lead to accurate inference, it also enables optimization of the exact log-likelihood of the data, instead of a lower bound of it.
Results
Using our techniques we achieve significant improvements on standard benchmarks compared to RealNVP, the previous best published result with flow-based generative models.
Dataset
RealNVP
Glow
CIFAR-10
3.49
3.55
Imagenet 32x32
4.28
4.09
Imagenet 64x64
3.98
3.81
LSUN (bedroom)
2.72
2.38
LSUN (tower)
2.81
2.46
LSUN (church outdoor)
3.08
2.67
Quantitative performance in terms of bits per dimension evaluated on the test set of various datasets, for the RealNVP model versus our Glow model.*
Loading...
Glow models can generate realistic-looking high-resolution images, and can do so efficiently. Our model takes about 130ms to generate a 256 x 256 sample on a NVIDIA 1080 Ti GPU. Like previous work, we found that sampling from a reduced-temperature model often results in higher-quality samples. The samples above were obtained by scaling the standard deviation of the latents by a temperature of 0.7.
Interpolation in latent space
We can also interpolate between arbitrary faces, by using the encoder to encode the two images and sample from intermediate points. Note that the inputs are arbitrary faces and not samples from the model, thus providing evidence that the model has support over the full target distribution.
Loading...
Manipulation in latent space
We can train a flow-based model, without labels, and then use the learned latent reprentation for downstream tasks like manipulating attributes of your input. These semantic attributes could be the color of hair in a face, the style of an image, the pitch of a musical sound, or the emotion of a text sentence. Since flow-based models have a perfect encoder, you can encode inputs and compute the average latent vector of inputs with and without the attribute. The vector direction between the two can then be used to manipulate an arbitrary input towards that attribute.
The above process requires a relatively small amount of labeled data, and can be done after the model has been trained (no labels are needed while training). Previous work using GAN’s requires training an encoder separately. Approaches using VAE’s only guarantee that the decoder and encoder are compatible for in-distribution data. Other approaches involve directly learning the function representing the transformation, like Cycle-GAN’s, however they require retraining for every transformation.
Plain Text
1# Train flow model on large, unlabelled dataset X2m = train(X_unlabelled)34# Split labelled dataset based on attribute, say blonde hair5X_positive, X_negative = split(X_labelled)67# Obtain average encodings of positive and negative inputs8z_positive = average([m.encode(x) for x in X_positive])9z_negative = average([m.encode(x) for x in X_negative])1011# Get manipulation vector by taking difference12z_manipulate = z_positive - z_negative1314# Manipulate new x_input along z_manipulate, by a scalar alpha \in [-1,1]15z_input = m.encode(x_input)16x_manipulated = m.decode(z_input + alpha * z_manipulate)
Simple code snippet for using a flow-based model for manipulating attributes
Contribution
Our main contribution and also our departure from the earlier RealNVP work is the addition of a reversible 1x1 convolution, as well as removing other components, simplifying the architecture overall.
The RealNVP architecture consists of sequences of two types of layers: layers with…
Excerpt shown — open the source for the full document.
Notability
Scored, but no written rationale attached yet.
OpenAI has a writing signal matching data demand, product and customer.