Comparing Predicted AlphaFold Protein Structures to Experimental Structures

Tom Goddard
August 6, 2021

Here are examples comparing protein structures predicted by the machine learning AlphaFold algorithm to experimental structures from electron microscopy and X-ray crystallography. Experimental structures are from the Protein Data Bank and the predicted structures are from the AlphaFold database which has about 365,000 structures from the proteomes of 21 organisms. The comparison is done in ChimeraX software using the alphafold match command.

Advantages

Drawbacks

Examples

Look at all recent structures (newer than May 2021) released after AlphaFold database structures were predicted so AlphaFold did not use these structures in making predictions. We look at several examples including single proteins, assembling single protein predictions into complexes, using exact sequence and homolog predictions.

Human voltage-gated sodium channel

This human voltage-gated sodium channel is a single protein with 4-fold symmetry, PDB 6LQA, and has an AlphaFold predicted structure with the exact same sequence UniProt Q14524.

ChimeraX commands to fetch the PDB and AlphaFold models and show helices as cylinders
  open 6lqa
  alphafold match #1
  preset cylinder     
AlphaFold chains matching 6lqa
ChainUniProt NameUniProt IdRMSDLengthSeen
B SCN5A_HUMAN Q14524 2.04 2016 1151

The command output indicates the AlphaFold model and experimental sample have 2016 residues but only 1151 are observed in the experimental structure with a 2 Angstrom C-alpha RMSD difference between predicted and experimental models. The AlphaFold model is superimposed and colored (blue to red) by the AlphaFold confidence score. The AlphaFold model has 4 large intracellular loops not seen in the experimental structure that can be hidden and the model given a single color using commands

  hide #2:1-118,430-698,943-1187,1782-2016 ribbon
  color #2 skyblue

Structure 6LQA from
electron microscopy

AlphaFold Q14524 colored blue to red based on the AlphaFold confidence score

Superimposed models

Salmonella sugar transporter

Salmonella sugar transporter 7L16 is not in the AlphaFold database but has an E. coli homolog P02921.

ChimeraX commands to fetch the PDB and AlphaFold models and show helices as cylinders
  open 7l16
  alphafold match #1
  preset cylinder     
AlphaFold chains matching 7l16
ChainUniProt NameUniProt IdRMSDLengthSeen% Id
A MELB_ECOLI P02921 2.50 472 453 85

To show a sequence alignment and color sequence differences between the Salmonella experimental structure and E. coli predicted structure use commands

  matchmaker #2 to #1 showAlignment true
  color #2 skyblue
  color #2::seq_conservation<=0 red    

Structure 7L16 from X-ray crystallography

AlphaFold P02921 colored blue to red by confidence score

Superimposed models. Sequence differences in red

Sequence alignment in ChimeraX, helices pink, conservation, differences green using
select #2::seq_conservation<=0

Human separase-CDK-cyclinB1 complex

Four protein complex composed of separase-securin fusion, CDK1, cyclin B1, CKS1, PDB entry 7NJ0. Separase releases chromosome pairs during mitosis and is inhibited by cylcin proteins. AlphaFold predictions of these four proteins in isolation mostly agree with electron microscopy except at interfaces.

ChimeraX commands to fetch the PDB and AlphaFold models and show helices as cylinders
  open 7nj0
  alphafold match #1
  preset cylinder     
AlphaFold chains matching 7nj0
ChainUniProt NameUniProt IdRMSDLengthSeen
A PTTG1_HUMAN O95997 7.24 43 1188
A ESPL1_HUMAN Q14674 5.63 2120 1188
B CDK1_HUMAN P06493 3.75 297 290
C CCNB1_HUMAN P14635 0.65 433 270
D CKS1_HUMAN P61024 0.74 79 70

To color the four chains and hide the large disordered loops in the AlphaFold model that are not resolved in the experimental structure use commands

  color #1/B red
  color #1/C sienna
  color #1/D wheat
  color #2/A skyblue
  color #2/B dodgerblue
  color #2/C blue
  color #2/D lightblue
  hide #2/A:1066-1097,1298-1571 ribbons
  hide #2/C:1-161 ribbons

Structure 7NJ0 from electron microscopy

AlphaFold 5 models colored blue to red by confidence score

Superimposed models. AlphaFold blue, experiment brown.

To see regions of where structures differ look at C-alpha RMSD per residue shown on sequence alignment created with command

  match #2.2 to #1/A showAlignment true

Experiment separase and CDK1 interface green.

Alphafold separase intersects CDK1 at interface yellow.

Large RMSD at separase / CDK1 interface due to incorrect AlphaFold conformation, highlighted in green.

Human separase-securin complex

Two protein complex of separase and securin, PDB entry 7NJ1, where securin is intertwined with separase. Separase releases chromosome pairs during mitosis and is inhibited by securin. AlphaFold predictions of these two proteins in isolation do not get the right conformation since protein-protein interactions strongly effect the securin conformation.

ChimeraX commands to fetch the PDB and AlphaFold models and show helices as cylinders
  open 7nj1
  alphafold match #1
  preset cylinder     
AlphaFold chains matching 7nj1
ChainUniProt NameUniProt IdRMSDLengthSeen
A ESPL1_HUMAN Q14674 11.95 2120 1448
B PTTG1_HUMAN O95997 22.47 202 53

To color the two chains and hide the large disordered loops in the AlphaFold model that are not resolved in the experimental structure use commands

  color #1/B red
  color #2.1 skyblue
  color #2.2 yellow  
  hide #2.1:1065-1143,1279-1572 ribbons
  hide #2.2:1-110,164-202 ribbons

Structure 7NJ1 from electron microscopy, separase brown, securin red.

AlphaFold Q14674 and O95997 colored blue to red by confidence score

Superimposed models. AlphaFold blue, experiment brown.

Securin conformation experimental red, AlphaFold yellow.

Photosystem I from a cyanobacterium

Trimeric photosystem I complex from cyanobacterium, PDB 6VPV. AlphaFold database homology models for 7 of the 11 unique proteins are found from rice, maize, arabidopsis, soybean all with less than 1 Angstrom C-alpha RMSD difference to the experimental structure. Some homologs have much shorter sequences than the experimental structure.

ChimeraX commands to fetch the PDB and AlphaFold models and show helices as cylinders
  open 6vpv
  color #1 bypolymer
  alphafold match #1

Structure 6VPV from electron microscopy

AlphaFold 7 homologs colored blue to red by confidence score
AlphaFold chains matching 6vpv
 Chain UniProt NameUniProt IdRMSDLengthSeen% Id
D d 4 B4FAW3_MAIZE B4FAW3 1.22 136 139 65
L l 0 C6T1C8_SOYBN C6T1C8 8.80 173 160 44
A a 1 PSAA_ARATH P56766 0.59 742 741 82
B b 2 PSAB_ORYSJ P0C358 0.69 733 737 82
C c 3 PSAC_ORYSJ P0C361 0.47 80 80 89
E e 5 PSAE2_ARATH Q9S714 1.37 63 68 61
F f 6 PSAF_ARATH Q9SHE8 4.33 154 141 53