google-deepmind/alphafold

Python

Open original ↗

Captured source

source ↗
published Jun 17, 2021seen 5dcaptured 10hhttp 200method plain

google-deepmind/alphafold

Description: Open source code for AlphaFold 2.

Language: Python

License: Apache-2.0

Stars: 14656

Forks: 2626

Open issues: 307

Created: 2021-06-17T14:06:06Z

Pushed: 2026-04-22T17:58:21Z

Default branch: main

Fork: no

Archived: no

README: ![header](imgs/header.jpg)

AlphaFold

This package provides an implementation of the inference pipeline of AlphaFold v2. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.

We also provide:

1. An implementation of AlphaFold-Multimer. This represents a work in progress and AlphaFold-Multimer isn't expected to be as stable as our monomer AlphaFold system. [Read the guide](#updating-existing-installation) for how to upgrade and update code. 2. The [technical note](docs/technical_note_v2.3.0.md) containing the models and inference procedure for an updated AlphaFold v2.3.0. 3. A [CASP15 baseline](docs/casp15_predictions.zip) set of predictions along with documentation of any manual interventions performed.

Any publication that discloses findings arising from using this source code or the model parameters should [cite](#citing-this-work) the AlphaFold paper and, if applicable, the AlphaFold-Multimer paper.

Please also refer to the Supplementary Information for a detailed description of the method.

**You can use a slightly simplified version of AlphaFold with community-supported versions (see below).

If you have any questions, please contact the AlphaFold team at [alphafold@deepmind.com](mailto:alphafold@deepmind.com).

![CASP14 predictions](imgs/casp14_predictions.gif)

Installation and running your first prediction

You will need a machine running Linux, AlphaFold does not support other operating systems. Full installation requires up to 3 TB of disk space to keep genetic databases (SSD storage is recommended) and a modern NVIDIA GPU (GPUs with more memory can predict larger protein structures).

Please follow these steps:

1. Install Docker.

  • Install

NVIDIA Container Toolkit for GPU support.

  • Setup running

Docker as a non-root user.

1. Clone this repository and cd into it.

git clone https://github.com/deepmind/alphafold.git
cd ./alphafold

1. Download genetic databases and model parameters:

  • Install aria2c. On most Linux distributions it is available via the

package manager as the aria2 package (on Debian-based distributions this can be installed by running sudo apt install aria2). Same for rsync.

  • Please use the script scripts/download_all_data.sh to download and set

up full databases. This may take substantial time (download size is 556 GB), so we recommend running this script in the background:

scripts/download_all_data.sh > download.log 2> download_all.log &
  • **Note: The download directory `` should *not* be a

subdirectory in the AlphaFold repository directory.** If it is, the Docker build will be slow as the large databases will be copied into the docker build context.

  • It is possible to run AlphaFold with reduced databases; please refer to

the [complete documentation](#genetic-databases).

1. Check that AlphaFold will be able to use a GPU by running:

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.

If you wish to run AlphaFold using Singularity (a common containerization platform on HPC systems) we recommend using some of the third party Singularity setups as linked in https://github.com/deepmind/alphafold/issues/10 or https://github.com/deepmind/alphafold/issues/24.

1. Build the Docker image:

docker build -f docker/Dockerfile -t alphafold .

If you encounter the following error:

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.

use the workaround described in https://github.com/deepmind/alphafold/issues/463#issuecomment-1124881779.

1. Install the run_docker.py dependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.

pip3 install -r docker/requirements.txt

1. Make sure that the output directory exists (the default is /tmp/alphafold) and that you have sufficient permissions to write into it.

1. Run run_docker.py pointing to a FASTA file containing the protein sequence(s) for which you wish to predict the structure (--fasta_paths parameter). AlphaFold will search for the available templates before the date specified by the --max_template_date parameter; this could be used to avoid certain templates during modeling. --data_dir is the directory with downloaded genetic databases and --output_dir is the absolute path to the output directory.

python3 docker/run_docker.py \
--fasta_paths=your_protein.fasta \
--max_template_date=2022-01-01 \
--data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir

1. Once the run is over, the output directory shall contain predicted structures of the target protein. Please check the documentation below for additional options and troubleshooting tips.

Genetic databases

This step requires aria2c to be installed on your machine.

AlphaFold needs multiple genetic (sequence) databases to run:

*…

Excerpt shown — open the source for the full document.