WritingReplicateReplicatepublished May 27, 2022seen 5d

Constraining CLIPDraw

Open original ↗

Captured source

source ↗
published May 27, 2022seen 5dcaptured 3dhttp 200method plain

Constraining CLIPDraw – Replicate blog

Replicate Blog

Constraining CLIPDraw

Posted May 27, 2022 by evilstreak

I’m Dom, one of the engineers here at Replicate. My background is web engineering, not machine learning. I have a pretty good sense of what machine learning can do from the outside –  image generation , detecting objects in images , self-driving cars – but only a faint idea of how it actually works on the inside. I studied some AI as part of my degree, many years ago, which gives me a basic (if very out of date) grounding in things like neural networks, but that’s a long way off knowing what it’s like to actually build machine learning models.

Luckily for me I work with a team of people who have PhDs in this stuff, so I’ve been working with our co-founder Andreas to learn more about how it works.

One of the models on Replicate that I’ve found most fascinating is CLIPDraw . You give it a prompt, it generates a whole bunch of random squiggles on a canvas, and then slowly morphs those squiggles into something representing your prompt. It’s nowhere near as configurable nor refined as something like pixray/text2image , but it somehow feels much more human and grokkable in the way it uses paths to create an image.

My desktop is full of outputs from this model, so it was the natural choice when choosing something to work on. 

Inspired by some of dribnet’s art (the author of pixray/text2image ), the first change I wanted to make was to encourage clipdraw to do most of its drawing in the middle of the canvas. Sometimes the outputs it generates feel a bit scruffy and spread out, and I wanted to see what it would look like more tightly bunched.

A brief aside on the dev environment

I’m working on an M1 Macbook, which isn’t all that compatible with developing GPU-based machine learning models. (Though, hot off the press! PyTorch have announced M1 support , which I’m keen to try out.) To get around that, we used a Google Cloud Platform instance with an attached T4 GPU – the same GPU we use to run models on replicate.com .

Create the instance:

gcloud compute instances create

my-dev-instance

—zone=us-central1-c

—machine-type=n1-standard-8

—accelerator type=nvidia-tesla-t4,count=1

—boot-disk-size=1024GB

—image-project=deeplearning-platform-release

—image-family=common-cu113

—maintenance-policy TERMINATE

—restart-on-failure

—scopes=default,storage-rw

—metadata=“install-nvidia-driver=True”

SSH into your instance with port forwarding, for Jupyter notebook, and key forwarding, for GitHub:

gcloud compute ssh —zone us-central1-c my-dev-instance — -L 8888:localhost:8888 -A

Install Cog , clone your model repo, and then run your Cog model with a notebook :

cog run -p 8888 jupyter notebook —allow-root —ip=0.0.0.0

Once it’s running, open the link it prints out, and you should have access to your notebook!

Once you’ve got your instance set up you can stop and start it as needed. It’ll keep your cloned repo, and you’ll just need to rerun the cog run command each time.

Back to the story

This was my first experience working with PyTorch. Along the way I encountered a few issues, and learning opportunities: code which seemed to run, but didn’t produce any results because the gradient flow was interrupted; learning about vectorisation and how to run operations on tensors; and rethinking how to design my code, to take it from something procedural to something differentiable.

Breaking the gradient flow

With my background in web engineering, my initial approach to this problem would be something like: for each point, calculate its distance from the centre, and add to the loss for anything further away than some minimum. In code, after getting to grips with how tensors work, that might look something like:

Copy

for path_points in points_vars: loss += torch.sum(torch.tensor([torch.sqrt((p[ 0 ] - 112 ) 2 + (p[ 1 ] - 112 ) 2 ) for p in path_points]))

That runs, but unfortunately it makes absolutely no difference to the output. When we print out the value we’re adding to loss , it looks like it should work. And it’s a big number, so it should completely dominate the loss from CLIP for making it look like the prompt. In order to work out what’s going on, we tried to isolate this so that the loss from prompt similarity wasn’t a factor:

Copy

loss = 0 for path_points in points_vars: loss += torch.sum(torch.tensor([torch.sqrt((p[ 0 ] - 112 ) 2 + (p[ 1 ] - 112 ) 2 ) for p in path_points]))

This fails with the error element 0 of tensors does not require grad and does not have a grad_fn . At this point Andreas patiently explained to me that the way this works is that PyTorch is keeping track of all the variables and operations involved in calculating loss , so that when it comes time to minimise it, it can work out how to change the variables to make the total loss smaller. For that to work, there needs to be a connection through every operation we perform on loss . By creating a new tensor we’re breaking that chain, and PyTorch can’t backtrack through our operations to work out how to minimise the loss.

One of the hardest parts about debugging this issue is that the failure was completely opaque. Andreas tells me it’s a common issue when working with ML models — you have to develop an intuition for what’s causing it to break, so you can step through the right part and work out the fix. Simplifying your code down to isolate the problem is a useful approach.

Vectorisation

One option to fix our broken chain is to use torch.stack() instead of creating a new tensor:

Copy

for path_points in points_vars: loss += torch.sum(torch.stack([torch.sqrt((p[ 0 ] - 112 ) 2 + (p[ 1 ] - 112 ) 2 ) for p in path_points]))

That works, but the route we actually took was to remove the list comprehension and use vectorisation, so we never have to make a new tensor in the first place. This approach is something that came pretty naturally to Andreas, who’s got a lot of experience with differentiable programming. When working with tensors, we can apply an operation to all the items in the tensor at once. In our code above, path_points is a two dimensional tensor:

Copy

path_points

tensor([[ 11.8556, 108.8475],

[ 10.2365, 107.1833],

[ 10.8576, 117.2298],

[ 0.6850, 123.0041]], requires_grad=True)

We can subtract 112 from the tensor as a whole, rather than each individual element:

Copy

path_points - 112

tensor([[-100.1444, -3.1525],

[-101.7635, -4.8167],

[-101.1424,…

Excerpt shown — open the source for the full document.