Summary: | With BLCR, xterm gives error in pty shell | ||
---|---|---|---|
Product: | Slurm | Reporter: | Brian Christiansen <brian> |
Component: | Scheduling | Assignee: | Brian Christiansen <brian> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | alex, hpc-staff |
Version: | 17.02.7 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | NYU | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Brian Christiansen
2017-09-27 08:42:28 MDT
I'm investigating why the code explicitly sets LD_PRELOAD. We'll get back you on what we find. On a side note. In the past we have encouraged others to consider using other alternatives to BLCR, such as SCR[1]. There are known limitations of BLCR. One of which is it requires each pid running at the time of checkpoint to be available for it's own use when restart happens. In the past we also have had concerns that might have been corrected with time, but maybe not: BLCR makes assumptions like files aren't changed with time, and it is not checked, it doesn't support GPUs, if you have to talk to a license server it will cause problems and so on. [1] http://computation.llnl.gov/projects/scalable-checkpoint-restart-for-mpi After further investigation, this appears to be an issue with how xterm handles LD_PRELOAD. We could do some things to prevent the error from happening but that would just be masking that blcr won't work with xterm. Further, we've decided to deprecate the BLCR plugin in 17.11 and remove it in 18.08. BLCR can still be used but it will have to be run manually by the user. We recommend investigating the other alternatives such as SCR and DMTCP. Let us know if you have any questions. Thanks, Brian Deprecating BLCR sounds good to me. Project seems to be inactive since ~January 2013. |