How to make an AlphaFold Singularity Image

Tom Goddard
April 6, 2022

Here is how I made an AlphaFold singularity image for the UCSF Wynton cluster. The Wynton cluster does not allow running Docker images because of security issues, so we run it using a singularity image.

The AlphaFold github repository has scripts to build a Docker image for running AlphaFold. We build that, then convert it to a Singularity image.

Commands for building AlphaFold singularity image

The UCSF Wynton cluster does not support Docker and does not allow building singularity images since both require root privileges. So I do these steps on a desktop Ubuntu 20.04 system where I have root access.

    
$ cd ~/ucsf
$ git clone git@github.com:deepmind/alphafold.git
$ cd ~/ucsf/alphafold
$ sudo docker build -f docker/Dockerfile -t alphafold220 .
Successfully built 7f7b50fd42c9
Successfully tagged alphafold220:latest
$ cd ~/ucsf
$ sudo docker save 7f7b50fd42c9 -o alphafold220_docker.tar
$ sudo singularity build alphafold220.sif docker-archive://alphafold220_docker.tar
$ rsync -av alphafold220.sif plato.cgl.ucsf.edu:alphafold_singularity

Script to run AlphaFold singularity image on Wynton

The AlphaFold github code repository contains a run_docker.py script for running the AlphaFold Docker image. I wrote a similar script run_alphafold220.py for running the AlphaFold Singularity image.

The script is designed for submitting jobs queued on the Wynton cluster using the Sun Grid Engine (SGE) queueing system. The comment lines at the top of the script give default SGE qsub command options to run in the GPU queue for up to 48 hours on an A40 or A100 GPU.

My script was written starting with the run_docker.py and replacing the docker image invocation with the equivalent singularity image invocation. I made many other changes to the AlphaFold run_docker.py script to improve ease of use:

Submitting an AlphaFold job on Wynton

Here is how to queue an AlphaFold jobs on Wynton.

      cd /wynton/home/ferrin/goddard/alphafold_singularity
      qsub run_alphafold220.py --fasta_paths=seq_7p8x_A.fasta
    

with FASTA file seq_78px_A.fasta as given here

>7P8X_1|Chain A|Leucotoxin LukEv|Staphylococcus aureus (1280)
MSVGLIAPLASPIQESRANTNIENIGDGAEVIKRTEDVSSKKWGVTQNVQFDFVKDKKYNKDALIVKMQGFINSRTSFSDVKGSGYELTKRMIWPFQYNIGLTTKDPNVSLINYLPKNKIETTDVGQTLGYNIGGNFQSAPSIGGNGSFNYSKTISYTQKSYVSEVDKQNSKSVKWGVKANEFVTPDGKKSAHDRYLFVQSPNGPTGSAREYFAPDNQLPPLVQSGFNPSFITTLSHEKGSSDTSEFEISYGRNLDITYATLFPRTGIYAERKHNAFVNRNFVVRYEVNWKTHEIKVKGHNKHHHHHH

And here is an example running a multimer prediction with two proteins

      qsub run_alphafold220.py --fasta_paths=seq_6z03.fasta --model_preset=multimer
    

with the two sequences in FASTA file seq_6z03.fasta containing

>6Z03_1|Chains A|DNA topoisomerase I|Caldiarchaeum subterraneum (311458)
MVKWRTLVHNGVALPPPYQPKGLSIKIRGETVKLDPLQEEMAYAWALKKDTPYVQDPVFQKNFLTDFLKTFNGRFQDVTINEIDFSEVYEYVERERQLKADKEYRKKISAERKRLREELKARYGWAEMDGKRFEIANWMVEPPGIFMGRGNHPLRGRWKPRVYEEDITLNLGEDAPVPPGNWGQIVHDHDSMWLARWDDKLTGKEKYVWLSDTADIKQKRDKSKYDKAEMLENHIDRVREKIFKGLRSKEPKMREIALACYLIDRLAMRVGDEKDPDEADTVGATTLRVEHVKLLEDRIEFDFLGKDSVRWQKSIDLRNEPPEVRQVFEELLEGKKEGDQIFQNINSRHVNRFLGKIVKGLTAKVFRTYIATKIVKDFLAAIPREKVTSQEKFIYYAKLANLKAAEALNHKRAPPKNWEQSIQKKEERVKKLMQQLREAESEKKKARIAERLEKAELNLDLAVKVRDYNLATSLRNYIDPRVYKAWGRYTGYEWRKIYTASLLRKFKWVEKASVKHVLQYFAEKLAKDVDKGMQVKAAV
>6Z03_2|Chains B|DNA topoisomerase I|Caldiarchaeum subterraneum (311458)
MVKWRTLVHNGVALPPPYQPKGLSIKIRGETVKLDPLQEEMAYAWALKKDTPYVQDPVFQKNFLTDFLKTFNGRFQDVTINEIDFSEVYEYVERERQLKADKEYRKKISAERKRLREELKARYGWAEMDGKRFEIANWMVEPPGIFMGRGNHPLRGRWKPRVYEEDITLNLGEDAPVPPGNWGQIVHDHDSMWLARWDDKLTGKEKYVWLSDTADIKQKRDKSKYDKAEMLENHIDRVREKIFKGLRSKEPKMREIALACYLIDRLAMRVGDEKDPDEADTVGATTLRVEHVKLLEDRIEFDFLGKDSVRWQKSIDLRNEPPEVRQVFEELLEGKKEGDQIFQNINSRHVNRFLGKIVKGLTAKVFRTYIATKIVKDFLAAIPREKVTSQEKFIYYAKLANLKAAEALNHKRAPPKNWEQSIQKKEERVKKLMQQLREAESEKKKARIAERLEKAELNLDLAVKVRDYNLATSLRNYIDPRVYKAWGRYTGYEWRKIYTASLLRKFKWVEKASVKHVLQYFAEKLAKDVDKGMQVKAAV

Running AlphaFold using Docker on Desktop

We run AlphaFold on an Ubuntu 20.04 machine in the lab with Nvidia RTX 3090 graphics minsky.cgl.ucsf.edu. For simpler use we modify the AlphaFold run_docker.py script to provide better default values as follows af220_minsky.py