Bad ligand chemistry in AlphaFold, OpenFold and Boltz predictions

Tom Goddard and Tristan Croll
February 25, 2026

Predicting ligand binding

Structure prediction programs AlphaFold 3, OpenFold 3 and Boltz 2 appear to have little understanding of ligand chemistry, frequently predicting conformations that do not match the bond order and chirality of the input ligands. Predicted conformations would have different hydrogens and binding properties. These errors are numerous (> 50% of ligands) and likely significantly degrade the quality of ligand binding predictions.

Predictions have wrong ligand chemistry

AlphaFold 3 does not appear to understand single versus double bonds, for instance predicting a tetraheral cyclohexane ring of 6 carbons as a planar benzene ring. The predicted conformation is a different molecule with different hydrogens. Binding predictions won't be accurate if AlphaFold 3 does not understand how many hydrogens are bonded.

cyclohexane
Cyclohexane (PubChem 8078).
cyclohexane alphafold 3
Cyclohexane predicted by AlphaFold 3
comes out as benzene (planar).
cyclohexane alphafold 3 with hydrogens
AlphaFold 3 prediction with
hydrogens added by ChimeraX
based on conformation.

More wrong chemistry: predicted serotonin-like ligands

Here we predict serotonin and 3 serotonin-like ligands varying in bond orders specified by SMILES strings, bound to a malaria protein (PDB 2qeh, not shown). Alphafold ignores the differing bond orders and predicts similar conformations for all variants.

serotonin variants
Serotonin variants.
Reference conformations from SMILES.
serotonin variants alphafold 3
Serotonin variants predicted by AlphaFold 3
hydrogens inferred from conformations using ChimeraX.
serotonin variants alphafold 3 superimposed
AlphaFold 3 predictions for variants
all have similar conformation

Predictions with wrong chirality

Alphafold 3 predictions frequently have the wrong chirality for ligand chiral centers. For example, here is S-lactic acid (biologically relevant form) predicted using SMILES string C[C@@H](C(=O)O)O which specifies stereochemistry. 3 of 5 of the predictions have the wrong chirality.

Below is another example with antiviral drug valganciclovir which has two chiral centers. AlphaFold 3 predicts one correctly (green) and one flipped (orange).

Prevalence of chemistry and chirality errors

AlphaFold 3, OpenFold 3 preview, and Boltz 2 all produce bad ligand chemistry and chirality in predictions for 52 antiviral drugs bound to HIV protease.

The table below shows number of ligands (atoms) with chemistry and chirality errors in predictions of 52 antiviral ligands bound to HIV protease dimer. Incorrect chemistry is assessed by adding hydrogen atoms inferred from the predicted ligand conformation using the ChimeraX addh command and comparing the hydrogens added to a reference ligand conformation.

More than half of the ligands have wrong chemistry or chirality in each of the tested programs.

AlphaFoldOpenFoldBoltzBoltz Steering
Ligands50525250
Wrong H20 (68)27 (87)16 (37)9 (29)
Flipped chirals25 (53)25 (47)27 (57)3 (3)
Wrong bonds2002

Boltz steering potentials

The above summary table shows Boltz with steering potentials enabled produces significantly fewer chemistry and chirality errors. The steering potentials apply forces to the atoms during the atom diffusion step of prediction. They do not affect the earlier PairFormer stage which infers atomic interactions. We suspect that the steering potentials fix the problems too late in the process to improve binding poses and affinity predictions. They essentially fix the geometry after the ligand placement was determined without knowledge of the hydrogens and chiralities.

Potential fixes

Here are speculative ideas about how to improve the ligand prediction understanding of chemistry and chirality.

  1. Retrain the ML prediction programs including explicit hydrogens on ligands.
  2. Include in the training loss function a penalty for wrong chirality based on the signed tetrahedral volume for all ligand atoms with 4 substituents.
  3. Embed bond orders for ligands when computing input features.

Chemistry and chirality errors for each ligand

wrong Hflipped chirals
ligandheavy atomsHringsheavy atoms with Halphafold3openfold3boltz2boltz2 steeringchiral centersalphafold3openfold3boltz2boltz2 steering
abacavir sulfate473662610
C1 N1 C2 C3 N7 C12 C13 C15 C16 C17
8
N1 N2 N7 N8 C12 C13 C26 C27
4
C1 C2 C15 C16
8
C1 C2 C12 C13 C15 C16 C26 C27
40
2
C9 C23
0
0
acyclovir1611270
0
0
0
00
0
0
0
acyclovir sodium1711170
2
O3 N4
2
O3 N4
2
O3 N4
0
0
0
0
adefovir dipivoxil34322140
0
0
0
00
0
0
0
atazanavir sulfate56522330
0
0
0
40
0
0
0
baloxavir marboxil40236161
C14
2
O7 C14
0
0
21
C14
2
C10 C14
2
C10 C14
0
bictegravir sodium33184133
N3 O4 C15
3
O2 N3 C15
2
N3 C15
0
30
0
0
0
brincidofovir38511272
C14 C15
0
0
0
11
C21
0
1
C21
0
cabotegravir sodium30173120
4
N3 C4 C11 C13
0
0
22
C2 C4
1
C4
1
C4
0
cidofovir1812180
0
0
0
11
C6
0
1
C6
0
darunavir38374260
0
0
0
40
0
0
0
darunavir dihydrate40372260
0
0
0
40
0
0
0
darunavir ethanolate41433292
C1 C2
0
0
0
40
0
0
0
darunavir hydrate39373260
0
0
0
40
0
0
0
darunavir propylene glycolate43453310
2
O9 C29
0
0
51
C29
1
C29
0
1
C29
dolutegravir sodium31193130
0
0
0
20
2
C2 C5
2
C2 C5
0
elbasvir65559374
N2 N3 N5 N6
4
N2 N3 N5 N6
2
N5 N6
0
52
C4 C36
2
C4 C36
1
C19
0
elvitegravir31223140
0
0
0
10
0
0
0
entecavir20153110
0
2
N3 N4
0
30
0
3
C3 C5 C6
0
entecavir anhydrous20153110
0
0
0
30
0
3
C3 C5 C6
0
famciclovir23192103
C4 C5 C6
0
0
0
00
0
0
0
fosamprenavir calcium40342230
2
N2 C7
2
C5 C6
0
31
C16
2
C6 C7
3
C6 C7 C16
0
fostemsavir tromethamine49364222
O1 C11
1
O1
4
C13 C14 C15 C16
4
C13 C14 C15 C16
00
0
0
0
ganciclovir1813290
0
0
0
00
0
0
0
ganciclovir sodium1913192
N4 O4
2
N4 O4
2
N4 O4
2
N3 O4
00
0
0
0
glecaprevir58467294
C3 C4 C24 C25
3
N4 C24 C25
1
C23
1
N4
72
C6 C22
1
C8
5
C8 C11 C13 C16 C18
0
grazoprevir anhydrous54507304
C15 C16 C19 C20
3
N6 C15 C16
3
C18 C19 C20
0
73
C12 C14 C37
2
C35 C37
5
C5 C8 C14 C35 C37
0
ledipasvir655410354
N2 N3 N4 N5
6
N2 N3 N4 N5 C8 C9
0
0
63
C36 C39 C42
4
C11 C35 C36 C39
2
C36 C39
0
lenacapavir sodium65326200
3
N2 N4 C30
2
N2 N4
2
N2 N4
0
3
C21 C33 C35
3
C21 C33 C35
0
letermovir41275180
0
0
0
10
1
C9
1
C9
0
lopinavir46484330
2
N4 C36
1
N4
0
40
0
0
0
maribavir24193140
0
0
0
44
C11 C12 C13 C14
4
C11 C12 C13 C14
4
C11 C12 C13 C14
0
molnupiravir23192140
0
0
0
40
0
0
0
nelfinavir mesylate45493310
4
C21 C22 C23 C24
0
0
60
1
C20
1
C20
0
oseltamivir phosphate27290151
C6
4
N2 C9 C10 C11
2
N1 C11
0
33
C6 C10 C11
2
C10 C11
1
C11
0
penciclovir18152100
0
0
0
00
0
0
0
peramivir23281170
3
N4 C6 C8
0
1
N4
55
C6 C7 C8 C10 C11
4
C6 C8 C10 C11
1
C7
0
pibrentasvir8065104115
C1 C2 N6 N7 C16 C17 C18 C19 C26 C27 C28 C29 C30 C50 C51
5
N2 N3 N6 N7 C19
4
N2 N3 N6 N7
4
N3 N6 N7 C13
84
C2 C16 C19 C50
1
C19
2
C16 C19
0
raltegravir potassium33212123
O4 N6 C14
0
0
0
00
0
0
0
remdesivir42354240
2
N2 C15
0
0
62
P1 C8
1
C8
0
0
ritonavir50484330
1
C7
0
0
40
0
0
0
sofosbuvir36293200
0
0
0
61
C10
3
P1 C2 C10
0
0
tenofovir alafenamide33293180
0
0
0
33
P1 C2 C10
2
P1 C2
3
P1 C2 C10
0
tenofovir alafenamide fumarate41312202
C22 C23
2
C22 C23
0
0
32
P1 C10
1
C2
1
C2
0
tenofovir disoproxil fumarate43321172
C20 C21
2
C20 C21
0
0
11
C2
1
C2
1
C2
0
tipranavir42334231
O5
2
C24 C25
1
O5
0
21
C4
2
C4 C9
2
C4 C9
0
valacyclovir hydrochloride24211110
0
0
0
11
C4
0
0
0
valganciclovir25232130
0
0
0
21
C4
0
1
C4
1
C7
valganciclovir hydrochloride26231130
0
0
0
21
C7
0
1
C7
1
C7
velpatasvir65549361
C24
6
N5 N6 C8 N8 C10 C24
3
N5 N6 C24
5
N3 N4 N5 C24 C33
64
C2 C5 C34 C39
1
C39
3
C7 C34 C36
0
voxilaprevir60527300
7
C19 C20 N26 C32 C43 C44 C45
0
0
83
C05 C09 C17
1
C09
3
C05 C17 C31
0
zanamivir23201152
C5 C6
2
C5 C6
0
0
50
0
0
0

Miscellaneous bugs

Incorrect bonds. Two of 52 alphafold and boltz steering predictions contained bonds not present in the input. This came about because the mmCIF atomic model output written by all the programs does not contain bond information for the ligands. Therefore the bonds must be guessed based on atom positions. For AlphaFold two structures had pairs of atoms in two-component ligand compounds that were unusually close and so an incorrect bonds was formed. With Boltz steering an oxygen of a ring structure was moved 5 Angstroms away from its correct ring position. The bad bonds were in AlphaFold predictions for acyclovir_sodium and lenacapavir_sodium. The bad bonds were in Boltz steering predictions for tipranavir, ledipasvir. We did not analyze chemistry and chirality mistakes for the ligands with incorrect bonds. The mmCIF output should contain explicit bond information for the ligands given that the programs are prone to atom steric clashes.

Incorrect atoms. Boltz writes mmCIF predictions with sodium atoms (Na) annotated as nitrogen and calcium atoms (Ca) annotated as carbon. This threw off comparisons that expected the same ligand atomic elements. I patched it in the comparison scripts. Ideally this would be fixed in Boltz code, although the open source Boltz project appears to have ended with the development team starting a company.

Chirality definition. Predicting incorrect chemistry leads to different placement and numbers of hydrogen atoms. That can change the ordering of the 4 substituents of a chiral center making it appear that the chirality changed when in fact the problem is a change in hydrogens. To avoid flagging those problems as chirality errors I measure chirality simply by the placement of the 4 neighbor atoms (ordered by atom name) when comparing reference ligand conformations to predicted conformations.

SMILES to 3d fails. To produce reference conformations for the 52 antiviral smiles strings I used the ChimeraX SMILES to 3d service hosted by the National Cancer Institute. This can fail to produce a 3D conformation, and did fail for the ligand voxilaprevir. I got a reference 3D conformation for voxilaprevir from PubChem. AlphaFold, OpenFold and Boltz all use RDKit to get 3D conformations from SMILES strings to use in the input features. All will use "random" coordinates from RDKit if it fails to produce good geometry. That can fail also (not sure how) in which case Boltz and OpenFold fail to predict, but AlphaFold 3 sets all atom coordinates to 0. I believe the only way the input features indicate ligand chemistry and chirality is via the input coordinates. So if the coordinate calculation is bad, this will obviously prevent the programs from producing the desired chemistry and chirality.

Data files

The results shown above can be downloaded: benzene.zip, serotonin.zip, lactic_acid.zip, antivirals.zip. A ChimeraX command script ligchem.py defining ChimeraX commands ligdiff and matchnames was used to look for chemistry and chirality differences between reference and predicted ligands.