WritingReplicateReplicatepublished Feb 21, 2023seen 5d

Machine learning needs better tools

Open original ↗

Captured source

source ↗
published Feb 21, 2023seen 5dcaptured 3dhttp 200method plain

Machine learning needs better tools – Replicate blog

Replicate Blog

Machine learning needs better tools

Posted February 21, 2023 by bfirsh

Replicate runs machine learning models in the cloud. We have a library of open-source models that you can run with a few lines of code. If you’re building your own machine learning models, Replicate makes it easy to deploy them at scale. Learn more.

Machine learning used to be an academic pursuit. If you wanted to work on it, you probably needed to be part of a lab or have a PhD.

In early 2021, there was a shift. Advadnoun released a Colab notebook called The Big Sleep . RiversHaveWings followed up with the VQGAN+CLIP notebook . These notebooks turned text descriptions into images by guiding a GAN with CLIP.

These projects weren’t increasing some accuracy metric by a fraction of a percent. They weren’t published as academic papers. They were just pieces of software that did something neat.

Image generated by The Big Sleep , circa 2021 People made copies of these notebooks and built upon them. We got pixray , dalle-mini , Disco Diffusion . It felt like something new was happening every week.

These people were not affiliated with a lab. They were often self-taught, just tinkering in their spare time. Or they were software engineers, cobbling together bits of machine learning code.

Part of what made this all possible was pre-trained foundation models. Instead of having to train a model from scratch at great expense, individuals could pick models off-the-shelf and combine them in interesting ways. Kind of like how you can import a few packages from npm and plug them together to make something new.

It all felt a lot more like open-source software than machine learning research.

Then, Stable Diffusion came along.

The now-classic “astronaut riding on a horse”, generated by Stable Diffusion DALL-E 2, a text-to-image model of similar quality, was released a few months earlier but it was closed source and behind a private beta. We’re used to advances in normal software being open source and on GitHub. DALL-E was the most extraordinary piece of software that had been seen for years, and software engineers couldn’t use it .

That’s what created the fertile ground for Stable Diffusion. Stable Diffusion was open-source, and it was much better quality than the previous generation of open-source text-to-image models.

It caught the imagination of hackers and there was an explosion of forks: inpainting , animation , texture generation , fine-tuning , and so on.

These open-source image generation models were suddenly good enough to be useful. People were building a ton of stuff. There were mobile apps to generate images from text, apps for creating avatars of yourself from just a few training images, and procedurally generated games.

There’s a catch, though.

Machine learning is too hard to use

If you try to actually build something with these machine learning models, you find that none of it really works. You spend all day battling with messy Python scripts, broken Colab notebooks, perplexing CUDA errors, misshapen tensors. It’s a mess.

Normal software used to be like this. If you wanted to build a website 20 years ago it felt like trying to use machine learning today. You had to build web servers, authentication systems, user interface components. You were concatenating HTML and SQL by hand, hoping you didn’t get owned. To deploy, you uploaded files to an FTP server and waited and hoped for the best.

But then we got Ruby on Rails. And Django. And React. And Heroku. And Next.js. And Vercel. And, all the rest. Tools that made software development easier, and more fun.

In his Software 2.0 essay , Andrej Karpathy introduced the idea that deep learning is becoming sufficiently advanced that it is replacing large parts of normal software. But he also said it’s a weird kind of software that requires a new set of tools to use. He made the case that we need a new tooling stack:

The lens through which we view trends matters. If you recognize Software 2.0 as a new and emerging programming paradigm instead of simply treating neural networks as a pretty good classifier in the class of machine learning techniques, the extrapolations become more obvious, and it’s clear that there is much more work to do. … Who is going to develop the first Software 2.0 IDEs, which help with all of the workflows in accumulating, visualizing, cleaning, labeling, and sourcing datasets? … Traditional package managers and related serving infrastructure like pip, conda, docker, etc. help us more easily deploy and compose binaries. How do we effectively deploy, share, import and work with Software 2.0 binaries?

The reason machine learning is so hard to use is not because it’s inherently hard. We just don’t have good tools and abstractions yet. You shouldn’t have to understand GPUs to use machine learning, in the same way you don’t have to understand TCP/IP to build a website.

Machine learning was hard at Spotify

Andreas is an old friend of mine. He was a machine learning engineer at Spotify. He was a kinda weird machine learning engineer in that he did everything from fundamental research all the way down to building one of the main deep learning pipelines at Spotify. Everything from math, to products, to servers.

My background is in developer tools, most recently at Docker where I created Docker Compose. I also founded a few now-defunct developer tools startups, accidentally wrote a book about command-line interfaces , and was a misuser of web browsers a long time ago .

Andreas was telling me about a cluster of related problems at Spotify:

It was hard to run open-source machine learning models. All these advances were locked up inside prose in PDFs, scraps of code on GitHub, weights on Google Drive (if you were lucky!). If you wanted to build upon this research, or apply it to real-world problems, you had to implement it all from scratch.

It was hard to deploy machine learning models to production. Typically a researcher would have to sit down with an engineer to decide on an API, get a server written, package up dependencies, battle CUDA, get it running efficiently, and so on and so forth. It would take weeks to get something running in production.

Through my Docker eyes, these things Andreas was describing were Docker-shaped problems.

If only we could define a standard box for machine learning models, then researchers could put their models inside it. They could…

Excerpt shown — open the source for the full document.