ESMFold Protein Structure Prediction

ESMFold predicted protein: farnesyltransferase subunit alpha colored by pLDDT confidence (blue, yellow) compared to experimental structure PDB 7t0a (white) with farnesyltransferase subunit beta shown as a surface.

Tom Goddard
November 9, 2022

You can predict a protein structure from a sequence using ESMFold (Evolutionary Scale Modeling) from ChimeraX (daily build from November 9, 2022 or newer, not in 1.5). ChimeraX uses the prediction server provided by the ESM Metagenomic Atlas described in this article

Evolutionary-scale prediction of atomic level protein structure with a language model
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives

Example ChimeraX Commands

This prediction of farnesyltransferase subunit alpha takes about 20 seconds and uses the sequence from an experimental structure PDB 7t0a chain A.

open 7T0A
esmfold predict /A

You can also make the prediction the sequence with no experimental structure.

esmfold predict MGSSHHHHHHSQDLMVTSTYIPMSQRRSWADVKPIMQDDGPNPVVPIMYSEEYKDAMDYFRAIAAKEEKSERALELTEIIVRMNPAHYTVWQYRFSLLTSLNKSLEDELRLMNEFAVQNLKSYQVWHHRLLLLDRISPQDPVSEIEYIHGSLLPDPKNYHTWAYLHWLYSHFSTLGRISEAQWGSELDWCNEMLRVDGRNNSAWGWRWYLRVSRPGAETSSRSLQDELIYILKSIHLIPHNVSAWNYLRGFLKHFSLPLVPILPAILPYTASKLNPDIETVEAFGFPMPSDPLPEDTPLPVPLALEYLADSFIEQNRVDDAAKVFEKLSSEYDQMRAGYWEFRRRECAE

Fetching precomputed models from the ESM Metagenomic Atlas

You can fetch a structure from the ESM Metagenomic Atlas in ChimeraX using the MGnify database identifier. To find identifiers a sequence search can be done at the atlas web site taking a few minutes. Predicted aligned error (PAE) can also be fetched.

esmfold fetch MGYP001094276757 pae true

Limitations

  1. Maximum sequence length. The server has a maximum sequence length of 400 amino acids. You can predict a subsequence with the residueRange ChimeraX command option.

    esmfold predict /B residueRange 1,400

    or you can automatically predict large sequences into chunks with some overlap

    esmfold predict #1 chunk 400 overlap 20

  2. Less accurate than AlphaFold. The predictions are often less accurate than AlphaFold.
  3. Monomeric proteins only. Protein complexes cannot be predicted. Each protein in a complex can be predicted separately.
  4. Predicted PAE not available. Although ESMFold computes predicted aligned error (PAE) the server does not give access to it. The PAE is available for fetched atlas entries.
  5. One prediction at a time. The ESM Metagenomic Atlas developers request that users run only one prediction at a time due to server capacity limitations.
  6. Server timeout. The ESM Metagenomic Atlas server sometimes times out during a prediction.