Opened 6 years ago
Closed 3 years ago
#2906 closed enhancement (wontfix)
Building OpenMM installer in devtoolset-7
| Reported by: | Tristan Croll | Owned by: | Tristan Croll |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Build System | Version: | |
| Keywords: | Cc: | Greg Couch, Tom Goddard, Eric Pettersen | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
I've just compiled working OpenMM 7.5 installer in devtoolset-7, installed it into ChimeraX (built in devtoolset-7 on my machine) and built ISOLDE against it - all appears to work fine (although the ChimeraX build has a tendency to segfault on exit - probably something I've done wrong). I'm out of time today, but will summarise what I've done tomorrow. Build is based on the nvidia/cuda:10.2-devel-centos7 Docker image, with modified versions of the scripts at https://github.com/openmm/openmm/tree/master/devtools/packaging/scripts/linux. With a little more modification their use of miniconda could easily be removed in favour of using ChimeraX's Python.
Change History (15)
comment:1 by , 6 years ago
| Component: | Unassigned → Build System |
|---|
follow-up: 2 comment:2 by , 6 years ago
follow-up: 3 comment:3 by , 6 years ago
I suspect I’ve been going around in circles all day, but this seems relevant: https://community.rti.com/forum-topic/double-free-error-exiting-certain-applications. I think the issue is that libOpenMM will need to link the ChimeraX library providing the stdc++ symbols and remove the link to libstdc++. Will try tomorrow.
comment:4 by , 6 years ago
This is a great experiment, and I'd like for you to figure it out, but I wonder why you aren't using the OpenMM C API? Looking at the OpenMM binaries on anaconda, there are multiple variations, one for several different versions of CUDA. Using those "official" binaries simplifies updates and a OpenMM installation tool could install the one that matched the version of CUDA that is installed on the system. And if the C API is used, then which C++ compiler was used to build OpenMM wouldn't matter.
That said, you should look at how devtoolset's libstdc++.so is implemented. libstdc++.so is acutally a loader script that layers additional functionality on top of the system's libstdc++.so:
/* GNU ld script Use the shared library, but some functions are only in the static library, so try that secondarily. */ OUTPUT_FORMAT(elf64-x86-64) INPUT ( /usr/lib64/libstdc++.so.6 -lstdc++_nonshared )
So the your analysis of the bug could be correct if the bug in the nonshared, i.e., static, part. But I would have expected Red Hat to handled std::string correctly in this case. Or maybe the bug is that ISOLDE is statically linking libstdc++?
comment:5 by , 6 years ago
Forgot to mention that ChimeraX only dynamically links against libstdc++.
follow-up: 6 comment:6 by , 6 years ago
I did think about the C-API approach, but there are a few things I need to do that fairly unavoidably requires using the C++ objects. The other approach I *could* use rests on the fact that I don’t currently have any libraries that simultaneously link both OpenMM and ChimeraX. So I *could* technically precompile my OpenMM-requiring libraries using its matching compiler before moving on to the rest of the build... but that makes for a much less neat build system than the current one. We’ve talked back and forth on this a few times, but in the long term it would probably best to have OpenMM in its own bundle to allow upgrading independent of the ChimeraX core. This exercise seems like a good step in that direction.
comment:7 by , 6 years ago
Tracked down the cause: OpenMM uses runtime dynamic loading via dlopen to load its plugin libraries, and is calling it with the RTLD_GLOBAL flag (see https://stackoverflow.com/questions/11005881/static-library-loaded-twice). Once upon a time RTLD_GLOBAL was necessary to allow dynamic casting and passing of exceptions between the libraries, but that hasn't been the case since GCC 4.5 (https://github.com/aalexand/sharedlib_typeinfo). If I switch to the RTLD_LOCAL flag instead everything appears to work fine and the crash goes away. I've put this in as a pull request on the OpenMM GitHub - since the latest code requires at least GCC 5, it should be a no-brainer.
follow-up: 8 comment:8 by , 6 years ago
I've just spent the last couple of hours using this build to work with a real case (a particularly nasty low-resolution crystal dataset) and everything seems perfectly stable. On 2020-03-04 11:25, ChimeraX wrote:
follow-up: 9 comment:9 by , 6 years ago
Peter Eastman on the OpenMM team just accepted my pull request. How about this: when OpenMM 7.5 is released, we use the official builds for Mac and Windows, and I can provide a devtoolset-7 build for Linux? My workstation is already CentOS 7 with all the prerequisites, and I can also put together a singularity recipe for later use. On 2020-03-04 14:47, ChimeraX wrote:
comment:10 by , 6 years ago
Sounds good. I am still worried about how to handle multiple versions of CUDA.
follow-up: 11 comment:11 by , 6 years ago
I wouldn’t worry too much about that. I only set CUDA as the default in Linux anyway. Yes, it’s slightly faster in simulation rate - but it requires the presence of a compiler and is a bit slower getting started compared to OpenCL. For Linux users who don’t have the correct CUDA version, it’ll silently fall back to OpenCL.
comment:12 by , 6 years ago
A repeat make install in the ChimeraX top-level directory after git pull fails for me in vdocs:
/home/tic20/chimerax-git/chimerax/ChimeraX.app/bin/ChimeraX --nogui --silent --exit --script '_vdoc.py build' user/index.html: already exists and is not a symlink make[1]: *** [Makefile:8: install] Error 1 make[1]: Leaving directory '/home/tic20/chimerax-git/chimerax/vdocs' make: *** [Makefile:38: install] Error 2
I can see the reason: _vdoc.generate_user_index() creates a real index.html in the code directory, which in subsequent runs trips up _vdoc.check_symlink() which expects everything in the directory to be a symlink.
comment:13 by , 6 years ago
By the way: if you want to bump up to devtoolset-7 earlier for ChimeraX 0.93 just let me know - I can do the minimal required patch to the OpenMM 7.4 source and provide a matching build.
comment:14 by , 6 years ago
I've had a bit more of a look at OpenMM's C API, and it turns out Greg's suggestion may well be possible. It'll take a bit of work (and care - I'm not particularly experienced in C, so will have to be extra careful with memory management), but I *should* be able to get there.
comment:15 by , 3 years ago
| Resolution: | → wontfix |
|---|---|
| Status: | assigned → closed |
Closing out this ancient ticket, since things have well and truly moved on.
I'm afraid I'm unfamiliar with how exactly you're statically linking the required libstdc++ symbols when building ChimeraX itself (and I don't really speak Makefile all that well), but it appears I'm going to need to use the same approach when building OpenMM. With everything build in devtoolset-7 but running in the native CentOS 7 environment, "vanilla" ChimeraX starts and closes fine, but if I start ChimeraX, start ISOLDE then close ChimeraX I get the following traceback, which I *think* suggests it's attempting to use the std::string destructor from the system libstdc++ on a GCC 7.3 string. Actually loading a model and running simulations in ISOLDE using OpenCL or CUDA works just fine - it's just this crash at the end that concerns me. *** Error in `/home/tic20/chimerax-git/chimerax/ChimeraX.app/bin/ChimeraX': double free or corruption (!prev): 0x00000000057bd4f0 *** #0 0x00007ffff74d9337 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007ffff74daa28 in __GI_abort () at abort.c:90 #2 0x00007ffff751be87 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff762e3b8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007ffff7524679 in _int_free (ar_ptr=0x7ffff786a760 <main_arena>, ptr=<optimized out>, str=0x7ffff762e4e0 "double free or corruption (!prev)", action=3) at malloc.c:4967 #4 0x00007ffff7524679 in _int_free (av=0x7ffff786a760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3843 #5 0x00007fffe6c38b63 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (__a=..., this=<optimized out>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:539 #6 0x00007fffe6c38b63 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (this=<optimized out>, __in_chrg=<optimized out>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:539 #7 0x00007ffff74dcc99 in __run_exit_handlers (status=status@entry=0, listp=0x7ffff786a6c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77 #8 0x00007ffff74dcce7 in __GI_exit (status=status@entry=0) at exit.c:99 #9 0x00007ffff7a26d39 in Py_Exit (sts=sts@entry=0) at Python/pylifecycle.c:2292 #10 0x00007ffff78d93be in handle_system_exit () at Python/pythonrun.c:636 #11 0x00007ffff7a30b17 in PyErr_PrintEx () at Python/pythonrun.c:715 #12 0x00007ffff7a30b17 in PyErr_PrintEx (set_sys_last_vars=set_sys_last_vars@entry=1) at Python/pythonrun.c:646 #13 0x00007ffff7a30b2a in PyErr_Print () at Python/pythonrun.c:542 #14 0x00007ffff7a5122d in pymain_run_module (modname=<optimized out>, set_argv0=set_argv0@entry=1) at Modules/main.c:323 #15 0x00007ffff7a56e19 in pymain_main (pymain=0x7fffffffd700) at Modules/main.c:2865 #16 0x00007ffff7a56e19 in pymain_main (pymain=pymain@entry=0x7fffffffd700) at Modules/main.c:3029 #17 0x00007ffff7a572d2 in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:3052 #18 0x00000000004008b7 in main () On 2020-03-02 18:42, ChimeraX wrote: