Opened 4 days ago
Last modified 3 hours ago
#20295 assigned defect
Make Boltz and OpenFold installs work with Nvidia Blackwell GPUs
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Structure Prediction | Version: | |
| Keywords: | Cc: | ||
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
ChimeraX installs PyTorch 2.7.1 based on cuda 12.6 when installing Boltz and OpenFold and that does not work on newer Nvidia Blackwell GPUs such as RTX 5090 where a newer PyTorch and cuda is needed. There are some instructions about how the user can fix this on the ChimeraX Boltz web page under Limitations
https://www.rbvi.ucsf.edu/chimerax/data/boltz-apr2025/boltz_help.html#limitations
but no user will find that.
Change History (2)
comment:1 by , 4 days ago
comment:2 by , 3 hours ago
Nvidia Blackwell architecture GPUs require CUDA 12.8. ChimeraX Boltz and OpenFold install PyTorch 2.7.1 for Cuda 12.6. Online sources suggest that PyTorch 2.7.1 for CUDA 12.8 can use Blackwell GPUs. If that is correct I may want to update the ChimeraX Boltz and OpenFold install to use PyTorch 2.7.1 for CUDA 12.8. CUDA 12.8 was released January 2025 and CUDA 13 is the current version. So it may be reasonable to require CUDA 12.8.
First I need to test if Boltz and OpenFold can run on Blackwell with torch 2.7.1/CUDA 12.8. I don't have a Blackwell GPU but can try on RunPod.
Current PyTorch version is 2.11. I may want to update from 2.7.1. Inference speed with Boltz with torch 2.8 and 2.9 was much slower on Mac (1.5x longer runtime) so I kept the version at 2.7.1. Might be worth testing speed on Mac with PyTorch 2.11.
This problem was reported by Trevor Sewell on the ChimeraX mailing list:
From: trevor SEWELL via ChimeraX-users Subject: [chimerax-users] Error using Boltz in Windows 11 with himeraX 1.11.1 Date: May 8, 2026 at 2:46:14 PM PDT To: "chimerax-users@cgl.ucsf.edu" Reply-To: trevor SEWELL How do I remedy the following error with Boltz predictions on a Windows 11 system with ChimeraX 1.11.1 I have tried Cuda 13.2 and Cuda 12.8 I am not sufficiently skilful to obey the following instructions: NVIDIA GeForce RTX 5080 Laptop GPU with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90. If you want to use the NVIDIA GeForce RTX 5080 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ Many thanks for your kind help! All the best Trevor Running boltz prediction failed with exit code 1: command: C:\Users\sewel/boltz22\Scripts\boltz.exe predict C:\Users\sewel/Desktop/boltz_omega_amidase\omega_amidase.yaml --accelerator gpu --no_kernels stdout: Boltz version 2.2.0 Checking input data. Processing 1 inputs with 1 threads. Running structure prediction for 1 input. stderr: 0%| | 0/1 [00:00Using bfloat16 Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\connectors\logger_connector\logger_connector.py:76: Starting from v1.9.0, tensorboardX has been removed as a dependency of the pytorch_lightning package, due to potential conflicts with other packages in the ML ecosystem. For this reason, logger=True will use CSVLogger as the default logger, unless the tensorboard or tensorboardX packages are found. Please pip install lightning[extra] or one of them to enable TensorBoard support by default Fri May 8 10:27:30 2026: Loading Boltz structure prediction weights C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\utilities\migration\utils.py:56: The loaded checkpoint was produced with Lightning v2.5.0.post0, which is newer than your current Lightning version: v2.5.0 Fri May 8 10:28:00 2026: Finished loading Boltz structure prediction weights Fri May 8 10:28:00 2026: Starting structure inference C:\Users\sewel\boltz22\Lib\site-packages\torch\cuda\_init_.py:287: UserWarning: NVIDIA GeForce RTX 5080 Laptop GPU with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90. If you want to use the NVIDIA GeForce RTX 5080 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn( You are using a CUDA device ('NVIDIA GeForce RTX 5080 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:420: Consider setting persistent_workers=True in 'predict_dataloader' to speed up the dataloader worker initialization. Traceback (most recent call last): File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\call.py", line 47, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 898, in _predict_impl results = self._run(model, ckpt_path=ckpt_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 982, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1021, in _run_stage return self.predict_loop.run() ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\loops\utilities.py", line 179, in _decorator return loop_run(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\loops\prediction_loop.py", line 105, in run self.setup_data() File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\loops\prediction_loop.py", line 162, in setup_data length = len(dl) if has_len_all_ranks(dl, trainer.strategy, allow_zero_length) else float("inf") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\utilities\data.py", line 105, in has_len_all_ranks if total_length == 0: ^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\sewel\boltz22\Scripts\boltz.exe\_main_.py", line 7, in File "C:\Users\sewel\boltz22\Lib\site-packages\click\core.py", line 1157, in _call_ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\click\core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\click\core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\boltz\main.py", line 1355, in predict trainer.predict( File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 859, in predict return call._call_and_handle_interrupt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\call.py", line 68, in _call_and_handle_interrupt trainer._teardown() File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1005, in _teardown self.strategy.teardown() File "C:\Users\sewel\boltz22\Lib\site-packages\pytorch_lightning\strategies\strategy.py", line 536, in teardown self.lightning_module.cpu() File "C:\Users\sewel\boltz22\Lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 82, in cpu return super().cpu() ^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\torch\nn\modules\module.py", line 1133, in cpu return self._apply(lambda t: t.cpu()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sewel\boltz22\Lib\site-packages\torch\nn\modules\module.py", line 915, in _apply module._apply(fn) File "C:\Users\sewel\boltz22\Lib\site-packages\torchmetrics\metric.py", line 907, in _apply _dummy_tensor = fn(torch.zeros(1, device=self.device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.