Tom Goddard
Biophysics 204B: Methods in Macromolecular Structure
January 8,13,15,20 2026
Will look at how to predict molecular structures (proteins, nucleic acids, ligands) using Boltz 2 and how to design proteins using BoltzGen, and how to visualize and assess confidence scores of prediction results with ChimeraX.
Class web site.
The first 3 classes will start with a journal club presentation by teams of 3 students. Here are the papers they will present.
Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction
Saro Passaro, Gabriele Corso, Jeremy Wohlwend, Mateo Reveiz, Stephan Thaler, Vignesh Ram Somnath, Noah Getz, Tally Portnoi, Julien Roy, Hannes Stark, David Kwabi-Addo, Dominique Beaini, Tommi Jaakkola, Regina Barzilay
June 14, 2025 bioRxiv preprint
Predicting protein-protein interactions in the human proteome
Zhang J, Humphreys IR, Pei J, Kim J, Choi C, Yuan R, Durham J, Liu S, Choi HJ, Baek M, Baker D, Cong Q. Predicting protein-protein interactions in the human proteome.
Science. 2025 Oct
BoltzGen: Toward Universal Binder Design
Hannes Stark, Felix Faltings, MinGyu Choi, Yuxin Xie, Eunsu Hur, Timothy O'Donnell, Anton Bushuiev, Talip Uçar, Saro Passaro, Weian Mao, Mateo Reveiz, Roman Bushuiev, Tomas Pluskal, Josef Sivic, Karsten Kreis, Arash Vahdat, Shamayeeta Ray, Jonathan T. Goldstein, Andrew Savinov, Jacob A. Hambalek, Anshika Gupta, Diego A. Taquiri-Diaz, Yaotian Zhang, A. Katherine Hatstat, Angelika Arada, Nam Hyeong Kim, Ethel Tackie-Yarboi, Dylan Boselli, Lee Schnaider, Chang C. Liu, Gene-Wei Li, Denes Hnisz, David M. Sabatini, William F. DeGrado, Jeremy Wohlwend, Gabriele Corso, Regina Barzilay, Tommi Jaakkola
Nov 24, 2025 bioRxiv preprint
Most of the time will be spent working on team projects. There will be 3 teams of 3 students each. Here are 3 project ideas.
Goal: Effective drugs need to avoid getting pumped out of cells by ABC transporters. Can Boltz predictions of drugs bound to ABC transporters indicate whether or not they will be pumped out, or inhibit the transporter?
Details: Predict drugs bound to an human ABC transporter drug efflux pump and see if the binding affinity, binding site location, binding confidence, bound pose, specific residue interactions correlate with experimental data on rates of clearing these drugs by the transporter.
The paper Machine Learning Modeling for ABC Transporter Efflux and Inhibition: Data Curation, Model Development, and New Compound Interaction Predictions (Oct 2025, Molecular Pharmaceitics) has a supplementary material Excel spreadsheet that lists 8800 compounds that are experimentally characterized substrates, inhibitors or neither for 4 human ABC transporters P-gp (ABCB1 monomer 1280 residue), BCRP (ABCG2 dimer 1310 total residues), MRP1 (ABCC1 monomer 1531 residues) and MRP2 (ABCC2 monomer 1545 resides). Another source of substrate/inhibitor experimental data is the UCSF-FDA TransPortal database.
Suggested Plan: Choose one transporter (maybe P-gp or BCRP since they are smaller and predictions will be faster) and a small set (30?) of compounds and run Boltz predictions and see if affinity, ipTM, binding site or other criteria in the structure prediction correlate with being a substrate/inhibitor/no-interaction. With 3 student teams one student could look at 10 known substrates, one at 10 known inhibitors, and one at 10 known non-binders.
These structures are too big to predict on a Mac laptop. Instead use a RunPod virtual machine or my lab's Linux / Nvidia 4090 server minsky.cgl.ucsf.edu A ChimeraX daily build from Jan 2026 or later will have the option to run Boltz predictions using a server. Each predicted ligand binding should take about 3 minutes. A RunPod virtual machine or minksy will only run one prediction at a time so 30 total predictions would take about 90 minutes. Batches of ligands can be predicted with the ChimeraX Boltz "for each ligand" option which is easier to setup and faster than running one at a time.. There is a ChimeraX video showing how to run predictions for a batch of ligands.
Goal: There are over 1600 human transcription factors that bind to chromosomal DNA promoter sequences and regulate gene expression. Can Boltz predictions identify which sequences a promoter will bind to?
Details: Boltz can correctly predict binding of transcription factors bound to duplex DNA seen in experimental PDB structures. But is Boltz sensitive to the specific sequence or does it indiscriminately bind to the DNA major groove of any DNA sequence? The experimental structures use the known promoter sequences. If we predict with Boltz using non-promoter DNA sequences or DNA that contains both the known promoter motif but also non-motif regions, will it bind? Will only the correct promoter give high confidence binding?
Suggested Plan: There are many structures of human transcription factors bound to DNA in the PDB. A PDB search specifying full text human transcription factor, plus number of distinct DNA entities >= 2, plus number of polymer residues per assembly <= 400, gave 513 results. I limited to 400 residues to so a prediction on a Mac laptop with 16 GB could work without running out of memory. But more likely you will need to use a Runpod virtual machine or my lab's Boltz server quillian.cgl.ucsf.edu, port 30172. Here are a few examples: PDB 9jzt, 9pfn (has deep mutational scan data too), 8q9n, 5ego (interesting Nature paper about TF-TF interactions). It would be reasonable test any small transcription factors you find. With a team of 3 students, each student could work with a different transcription factor, maybe try to choose ones that exhibit different binding topologies.
A prediction that would help reveal what Boltz knows about DNA sequence specificity is to add to the 2 strands of DNA about 15 complementary randomly chosen base pairs, then see if Boltz binds the transcription factor to the the correct promoter region as seen in the experimental structure. It would be useful to make multiple predictions by predicting an ensemble of structures (e.g. 10) using the ChimeraX Boltz "Number of predicted structures" option and see if they all bind to the same DNA region.
Here is an interesting review article The Human Transcription Factors (Cell 2018).
There is a ChimeraX video predicting a cyanobacteria NTCA transcription.
Goal: Use BoltzGen to design a protein or peptide or cyclic peptide that binds to a disease causing fibril peptide like amyloid beta or tau in Alzheimer's disease or alpha-synuclein fibrils in Parkinson's disease or transthyretin fibrils in ATTR (transthyretin amyloidosis).
Details: The disease causing peptides that form aggregates often form fibrils where many copies of the peptide stack in a beta-sheet. For example PDB 5oqv fibrils of amyloid-beta-42. BoltzGen can design a small protein or peptide or cyclic peptide that binds to the amyloid-beta-42 peptide tightly and possibly disrupt beta stacking into fibrils.
Suggested Plan: Change the BoltzGen example input file example/binding_disordered_peptides/tpp4.yaml to target amyloid-beta-42 peptide or another disease causing fibril peptide of your choice. Run BoltzGen with a small number of designs (e.g. 10) to see if input is correct and how long it takes. Use a Runpod virtual machine like we did on January 13 to run Boltzgen. Once a test run works, try scaling up to 100 or 1000 designs (maybe shoot for a run time of a few hours). BoltzGen developers suggest 10000-50000 designs for producing best designs, but this project is just a proof of concept.
For a team of 3 people one person could design a small protein binder, one can design a small peptide binder, and one could design a cyclic peptide binder (use "cyclic: True" in the .yaml input as shown in example file example/cyclic_against_hiv_antibody_site/9d3d.yaml).
To see how the design interacts with an oligomer of the disease peptide, you could predict 5 copies of the peptide with one (or more?) copies of the best design using Boltz 2. It would also be worth comparing to Boltz 2 predictions with 5 copies of just the disease peptide. Boltz 2 is unlikely to produce fibrils from the disease peptide because these often need the right nucleation conditions in vivo. Predicting 10 structures with Boltz 2 will likely produce a diversity of oligomer topologies. You can look at how disruptive the designed molecule is to disease peptide oligomers in the predictions.