Hi Slurm Experts: We are planning an upgrade from 17.02.1-2 to 17.02.7 in one week. Please help to answer some questions. 1/ Are there mysql DB schema changes across the version gaps? If yes, could you advise how to safely upgrade slurmdbd. 2/ Our StateSaveLocation is currently /tmp. We want to move to /opt/slurm/data/statesave. How can we safely do that during the upgrade downtime? 3/ Anything else we need to pay special attention? Thank you very much!
(In reply to NYU HPC Team from comment #0) > Hi Slurm Experts: > > We are planning an upgrade from 17.02.1-2 to 17.02.7 in one week. Please > help to answer some questions. > > 1/ > Are there mysql DB schema changes across the version gaps? If yes, could you > advise how to safely upgrade slurmdbd. No. Schema changes only happen on a major release, not within the maintenance releases. So the next schema change would be on 17.11.0 and up. > 2/ > Our StateSaveLocation is currently /tmp. We want to move to > /opt/slurm/data/statesave. How can we safely do that during the upgrade > downtime? Relocate the various files into the new location and you should be fine. > 3/ > Anything else we need to pay special attention? If you have any SPANK plugins, such as the x11 plugin, they'll need to be rebuild against the new release. Aside from that minor caveat, upgrading to different maintenance releases should be straightforward and relatively painless.
Thank you Tim. Yes I re-built slurm-spank-x11 rpm. It's found that with this new rpm or the previous version built against older Slurm, I see the below message when running the 'xterm' command. But the command runs fine in both cases. $ srun --x11 --pty /bin/bash [deng@c26-04 ~]$ xterm ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored. ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored. However running an interactive R and plotting a histogram, I do not see an error warning message as above.
Are you doing checkpoing/restart (BLCR)? Do you by chance have LD_PRELOAD defined in your environment? Does this help: https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#preload Are you able to ssh (-X) directly to the node and run an xterm? If so, do you get the same error?
Yes we starting to do blcr in Slurm. There are the library files in standard directory. $ ldconfig -p | grep libcr_run libcr_run.so.0 (libc6,x86-64) => /lib64/libcr_run.so.0 libcr_run.so (libc6,x86-64) => /lib64/libcr_run.so $ ls -l /lib64/libcr_run.so* lrwxrwxrwx 1 root root 18 Jun 6 17:02 /lib64/libcr_run.so -> libcr_run.so.0.5.5 lrwxrwxrwx 1 root root 18 Jun 6 17:02 /lib64/libcr_run.so.0 -> libcr_run.so.0.5.5 -rwxr-xr-x 1 root root 10176 Apr 5 15:10 /lib64/libcr_run.so.0.5.5 Before srun, there is no LD_PRELOAD defined. In a srun job, it is defined but without path. $ echo $LD_PRELOAD $ srun --x11 --pty /bin/bash [deng@c26-04 ~]$ echo $LD_PRELOAD libcr_run.so It seems that this part of code is related: https://github.com/SchedMD/slurm/blob/8d596cfc9136c6a3b624b37b8ea1881ae28f5ec1/src/plugins/checkpoint/blcr/checkpoint_blcr.c#L383
I get the same error message: brian@lappy:~/slurm/17.02/lappy$ echo $LD_PRELOAD brian@lappy:~/slurm/17.02/lappy$ srun -pdebug --pty $SHELL brian@lappy:~/slurm/17.02/lappy$ echo $LD_PRELOAD libcr_run.so brian@lappy:~/slurm/17.02/lappy$ xterm ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. The reason why the error message is happening is explained by the ld.so man page: LD_PRELOAD A list of additional, user-specified, ELF shared objects to be loaded before all others. The items of the list can be separated by spaces or colons. This can be used to selectively override functions in other shared objects. The objects are searched for using the rules given under DESCRIPTION. *** In secure-execution mode, preload pathnames containing slashes are ignored, and only shared objects in the standard search directories that have the set-user-ID mode bit enabled are loaded. *** e.g. brian@lappy:~/slurm/17.02/lappy$ locate libcr_run.so /usr/lib/libcr_run.so /usr/lib/libcr_run.so.0 /usr/lib/libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ ls -l /usr/lib/libcr_run.so lrwxrwxrwx 1 root root 18 Aug 4 2016 /usr/lib/libcr_run.so -> libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ ls -l /usr/lib/libcr_run.so.0.5.5 -rw-r--r-- 1 root root 10336 Aug 4 2016 /usr/lib/libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ sudo chmod +s /usr/lib/libcr_run.so brian@lappy:~/slurm/17.02/lappy$ ls -l /usr/lib/libcr_run.so.0.5.5 -rwSr-Sr-- 1 root root 10336 Aug 4 2016 /usr/lib/libcr_run.so.0.5.5 brian@lappy:~/slurm/17.02/lappy$ LD_PRELOAD=libcr_run.so xterm brian@lappy:~/slurm/17.02/lappy$ LD_PRELOAD=blah xterm ERROR: ld.so: object 'blah' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'blah' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'blah' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. brian@lappy:~/slurm/17.02/lappy$ sudo chmod -s /usr/lib/libcr_run.so brian@lappy:~/slurm/17.02/lappy$ LD_PRELOAD=libcr_run.so xterm ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. I'm investigating why the code explicitly sets LD_PRELOAD. We'll get back you on what we find.
I've created Bug 4192 to track the BLCR issue. Do you have any other questions regarding upgrading? If not lets close this one. Thanks, Brian
Okay, closing it.
Info given.