As noted in Bug 4181: Configuring CheckpointType=blcr and running xterm within an interactive srun you get an error: $ srun --x11 --pty /bin/bash [deng@c26-04 ~]$ xterm ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored. ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored. This is because the blcr code explicitly sets LD_PRELOAD to libcr_run.so. The reason why the error message is happening is explained by the ld.so man page: LD_PRELOAD A list of additional, user-specified, ELF shared objects to be loaded before all others. The items of the list can be separated by spaces or colons. This can be used to selectively override functions in other shared objects. The objects are searched for using the rules given under DESCRIPTION. *** In secure-execution mode, preload pathnames containing slashes are ignored, and only shared objects in the standard search directories that have the set-user-ID mode bit enabled are loaded. *** e.g. brian@lappy:~/slurm/17.02/lappy$ locate libcr_run.so /usr/lib/libcr_run.so /usr/lib/libcr_run.so.0 /usr/lib/libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ ls -l /usr/lib/libcr_run.so lrwxrwxrwx 1 root root 18 Aug 4 2016 /usr/lib/libcr_run.so -> libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ ls -l /usr/lib/libcr_run.so.0.5.5 -rw-r--r-- 1 root root 10336 Aug 4 2016 /usr/lib/libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ sudo chmod +s /usr/lib/libcr_run.so brian@lappy:~/slurm/17.02/lappy$ ls -l /usr/lib/libcr_run.so.0.5.5 -rwSr-Sr-- 1 root root 10336 Aug 4 2016 /usr/lib/libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ LD_PRELOAD=libcr_run.so xterm brian@lappy:~/slurm/17.02/lappy$ LD_PRELOAD=blah xterm ERROR: ld.so: object 'blah' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'blah' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'blah' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. brian@lappy:~/slurm/17.02/lappy$ sudo chmod -s /usr/lib/libcr_run.so brian@lappy:~/slurm/17.02/lappy$ LD_PRELOAD=libcr_run.so xterm ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
I'm investigating why the code explicitly sets LD_PRELOAD. We'll get back you on what we find.
On a side note. In the past we have encouraged others to consider using other alternatives to BLCR, such as SCR[1]. There are known limitations of BLCR. One of which is it requires each pid running at the time of checkpoint to be available for it's own use when restart happens. In the past we also have had concerns that might have been corrected with time, but maybe not: BLCR makes assumptions like files aren't changed with time, and it is not checked, it doesn't support GPUs, if you have to talk to a license server it will cause problems and so on. [1] http://computation.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
After further investigation, this appears to be an issue with how xterm handles LD_PRELOAD. We could do some things to prevent the error from happening but that would just be masking that blcr won't work with xterm. Further, we've decided to deprecate the BLCR plugin in 17.11 and remove it in 18.08. BLCR can still be used but it will have to be run manually by the user. We recommend investigating the other alternatives such as SCR and DMTCP. Let us know if you have any questions. Thanks, Brian
Deprecating BLCR sounds good to me. Project seems to be inactive since ~January 2013.