RepoScalewayScalewaypublished Mar 11, 2019seen 5d

scaleway/frontalization

Python

Open original ↗

Captured source

source ↗
published Mar 11, 2019seen 5dcaptured 9hhttp 200method plain

scaleway/frontalization

Description: Pytorch deep learning face frontalization model

Language: Python

Stars: 226

Forks: 50

Open issues: 11

Created: 2019-03-11T14:15:39Z

Pushed: 2023-07-15T09:37:33Z

Default branch: master

Fork: no

Archived: no

README:

Pytorch implementation of a face frontalization GAN

Introduction

Screenwriters never cease to amuse us with bizarre portrayals of the tech industry, ranging from cringeworthy to hilarious. With the current advances in artificial intelligence, however, some of the most unrealistic technologies from the TV screens are coming to life. For example, the Enhance software from *CSI: NY* (or *Les Experts : Manhattan* for the francophone readers) has already been outshone by the state-of-the-art Super Resolution neural networks. On a more extreme side of the imagination, there is *Enemy of the state*:

![](https://www.youtube.com/watch?v=3EwZQddc3kY?)

"Rotating [a video surveillance footage] 75 degrees around the vertical" must have seemed completely nonsensical long after 1998 when the movie came out, evinced by the youtube comments below this particular excerpt: ![](figs/comments.jpeg)

Despite the apparent pessimism of the audience, thanks to machine learning today anyone with a little bit of Python knowledge and a large enough dataset can take a stab at writing a sci-fi drama worthy program.

The face frontalization problem

Forget MNIST, forget the boring cat vs. dog classifiers, today we are going to learn how to do something far more exciting! This project was inspired by the impressive work by R. Huang et al. (Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis), in which the authors synthesise frontal views of people's faces given their images at various angles. Below is Figure 3 from that paper, in which they compare their results [\[1\]](https://arxiv.org/abs/1704.04086) to previous work [2-6]:

| ![](figs/c_0.jpg) |![](figs/c_1.jpg) | ![](figs/c_2.jpg) | ![](figs/c_3.jpg) | ![](figs/c_4.jpg) | ![](figs/c_5.jpg) |![](figs/c_6.jpg) |![](figs/c_7.jpg) | |---|---|---|---|---|---|---|---| | Input | [\[1\]](https://arxiv.org/abs/1704.04086) | [\[2\]](http://openaccess.thecvf.com/content_cvpr_2017/papers/Tran_Disentangled_Representation_Learning_CVPR_2017_paper.pdf) | [\[3\]](https://ieeexplore.ieee.org/document/7298667) | [\[4\]](https://arxiv.org/abs/1511.08446) | [\[5\]](https://ieeexplore.ieee.org/document/7298679) | [\[6\]](https://arxiv.org/abs/1411.7964) | Actual frontal | |Comparison of multiple face frontalization methods [\[1\]](https://arxiv.org/abs/1704.04086)|

We are not going to try to reproduce the state-of-the-art model by R. Huang et al. Instead, we will construct and train a face frontalization model, producing reasonable results in a single afternoon:

| input |![5epochs](figs/5_epochs_in.jpeg) | |---|---| | generated output | ![5epochs](figs/5_epochs_out.jpeg) |

Additionally, we will go over:

1. How to use NVIDIA's DALI library for highly optimized pre-processing of images on the GPU and feeding them into a deep learning model.

2. How to code a Generative Adversarial Network, praised as “the most interesting idea in the last ten years in Machine Learning” by Yann LeCun, the director of Facebook AI, in PyTorch

You will also have your very own Generative Adversarial Network set up to be trained on a dataset of your choice. Without further ado, let's dig in!

Setting Up Your Data

At the heart of any machine learning project, lies the data. Unfortunately, Scaleway cannot provide the CMU Multi-PIE Face Database that we used for training due to copyright, so we shall proceed assuming you already have a dataset that you would like to train your model on. In order to make use of NVIDIA Data Loading Library (DALI), the images should be in JPEG format. The dimensions of the images do not matter, since we have DALI to resize all the inputs to the input size required by our network (128 x 128 pixels), but a 1:1 ratio is desirable to obtain the most realistic synthesised images. The advantage of using DALI over, e.g., a standard PyTorch Dataset, is that whatever pre-processing (resizing, cropping, etc) is necessary, is performed on the GPU rather than the CPU, after which pre-processed images on the GPU are fed straight into the neural network.

Managing our dataset:

For the face frontalization project, we set up our dataset in the following way: the dataset folder contains a subfolder and a target frontal image for each person (aka subject). In principle, the names of the subfolders and the target images do not have to be identical (as they are in the Figure below), but if we are to separately sort all the subfolders and all the targets alphanumerically, the ones corresponding to the same subject must appear at the same position on the two lists of names.

As you can see, subfolder 001/ corresponding to subject 001 contains images of the person pictured in 001.jpg - these are closely cropped images of the face under different poses, lighting conditions, and varying face expressions. For the purposes of face frontalization, it is crucial to have the frontal images aligned as close to one another as possible, whereas the other (profile) images have a little bit more leeway.

For instance, our target frontal images are all squares and cropped in such a way that the bottom of the person's chin is located at the bottom of the image, and the centred point between the inner corners of the eyes is situated at *0.8h* above and *0.5h* to the right of the lower left corner (*h* being the image's height). This way, once the images are resized to 128 x 128, the face features all appear at more or less the same locations on the images in the training set, and the network can learn to generate the said features and combine them together into realistic synthetic faces.

![](figs/dataset.jpeg)

Building a DALI Pipeline:

We are now going to build a pipeline for our dataset that is going to inherit from nvidia.dali.pipeline.Pipeline. At the time of writing, DALI does not directly support reading (image, image) pairs…

Excerpt shown — open the source for the full document.