2208 – SGE command wrappers

Ticket 2208 - SGE command wrappers

Summary: SGE command wrappers

Status:	RESOLVED FIXED

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	User Commands (show other tickets)
Version:	16.05.x
Hardware:	Linux Linux

Importance:	--- 4 - Minor Issue
Assignee:	Danny Auble
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2015-12-01 08:57 MST by Tim Wickberg
Modified:	2019-11-03 21:13 MST (History)
CC List:	2 users (show)

See Also:
Site:	Stanford
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:	16.05.3 17.02.0-pre1
Target Release:	16.05
DevPrio:	1 - Paid
Emory-Cloud Sites:	---

Attachments
Wrapper requirements (deleted) 2016-04-22 05:39 MDT, Danny Auble	Details
Wrapper requirements (230.30 KB, image/png) 2016-04-22 05:43 MDT, Danny Auble	Details
Initial patch for Grid Engine qsub functionality (4.87 KB, patch) 2016-04-22 06:11 MDT, Danny Auble	Details \| Diff
Patch for Grid Engine qsub functionality (6.70 KB, patch) 2016-04-26 03:20 MDT, Danny Auble	Details \| Diff
Test suite for qsub -> sbatch commands (2.47 KB, text/plain) 2016-05-25 11:02 MDT, Keith Bettinger	Details
Show Obsolete (2) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Comment 5 Danny Auble 2016-04-22 05:43:55 MDT

Created attachment 3018 [details]
Wrapper requirements

Requirements for the wrappers.

Comment 6 Danny Auble 2016-04-22 06:11:40 MDT

Created attachment 3019 [details]
Initial patch for Grid Engine qsub functionality

This is to be patched on top of the 16.05 code base.  In an attempt to remove confusion of 2 qsubs we opted to just add Grid Engine functionality to the existing Torque qsub command.  So to use it one must apply the patch to the code and either build rpms and install the torque rpm or if building on the command line run make install-contrib after a make install which will install the qsub wrapper directly.

Please let me know if there are any issues with this patch or something doesn't work quite right.

Comment 7 Danny Auble 2016-04-26 03:20:14 MDT

Created attachment 3031 [details]
Patch for Grid Engine qsub functionality

Here is a patch that includes documentation on the new options.  It also switches the -pe shm $cnt option to set the cpus-per-task option instead of the -n option as specified in the docs.

This was changed based on an email from the client...

-pe:  We use "-pe shm <slots>" exclusively.  This parallel environment is designed to support applications that use pthreads to manage multiple threads with access to a single pool of shared memory.  The SGE PE restricts the slots used to a threads on a single host, so in this, I think it is equivalent to the --cpus-per-task option of sbatch.

-q: We use queues to segment jobs by time-to-run and by memory-needed.

We have three main queues for general users:
standard: for jobs under 6 hours
extended: for jobs under a week.
large: for jobs which have a high memory requirement

We have two high memory (>1TB) machines, and those accept all the "large" jobs and some of the other two.  The rest of the nodes have "standard" and "extended" queue slots on them.

I agree that our queues are more analogous to Slurm partitions than host lists.

Comment 8 Danny Auble 2016-05-18 09:09:57 MDT

Closing, I believe these are good to go.  Please reopen if needed.

Comment 9 Keith Bettinger 2016-05-25 10:36:17 MDT

I've used the --sbatchline option to compare the sbatch command that is generated to ones that I expected based on our Wrapper requirements.  Here are the issues I came up with:

o -l h_rt assumes its argument is in seconds (which is valid), but it can also support HH:MM:SS.
  o Slurm accepts minutes AND HH:MM:SS, so only the seconds need to be converted; HH:MM:SS could be passed without changes.

o Issue with -o, -e, and -j: it would be good if this script would use the default stdout/stderr filenames that SGE uses, and outputs a default stderr (-e) file if only a stdout (-o) file is given.
  o Proposal: Every sbatch command should have -o and -e, with default values of “$JOBNAME.o%j” and “$JOBNAME.e%j”, respectively, each of which can be overridden by command line arguments.
    o If -j y is given, the sbatch command can omit the -e argument.

o For -b y: the sbatch --wrap=”executable “ has an extra space before the quote.

o Would like options that represent defaults to be explicitly put on the command line:
  o qsub -m n ⇒ sbatch --mail-type=NONE
  o qsub -r n ⇒ sbatch --no-requeue

o For -l h_vmem, --mem or --mem-per-cpu are both appropriate, based on how the SGE is configured.  Ours is configured for something equivalent to --mem-per-cpu.
  o I'm not sure what should be in the release here.  I think --mem might be SGE's default.

o Don’t want warning output for -j and -cwd.

Comment 10 Keith Bettinger 2016-05-25 11:02:35 MDT

Created attachment 3146 [details]
Test suite for qsub -> sbatch commands

This document is a list of qsub commands and their expected sbatch equivalents.  If the qsub command is run with the --sbatchline switch, then the expected output can be compared directly to the output from the qsub command.

Comment 11 Danny Auble 2016-05-26 09:46:14 MDT

Thanks for the feedback Keith.  Inline comments below...

(In reply to Keith Bettinger from comment #9)
> I've used the --sbatchline option to compare the sbatch command that is
> generated to ones that I expected based on our Wrapper requirements.  Here
> are the issues I came up with:
> 
> o -l h_rt assumes its argument is in seconds (which is valid), but it can
> also support HH:MM:SS.
>   o Slurm accepts minutes AND HH:MM:SS, so only the seconds need to be
> converted; HH:MM:SS could be passed without changes.

This is done, commit 3735ed472657b.

> 
> o Issue with -o, -e, and -j: it would be good if this script would use the
> default stdout/stderr filenames that SGE uses, and outputs a default stderr
> (-e) file if only a stdout (-o) file is given.
>   o Proposal: Every sbatch command should have -o and -e, with default
> values of “$JOBNAME.o%j” and “$JOBNAME.e%j”, respectively, each of which can
> be overridden by command line arguments.

Hum, it looks like all variants of qsub do it like this.  I am surprised no one has ever said anything.

>     o If -j y is given, the sbatch command can omit the -e argument.

I believe both of the above are fixed in commit 6f909aea4d thanks for reporting.  If this isn't correct/expected let me know.

> 
> o For -b y: the sbatch --wrap=”executable “ has an extra space before the
> quote.

Fixed, commit fc6d005e1993 and 6c74751b42.

> 
> o Would like options that represent defaults to be explicitly put on the
> command line:
>   o qsub -m n ⇒ sbatch --mail-type=NONE

Turns out until commit 2a8177345ce this didn't work, it is now fixed :).

I added the functionality for the explicit set in commit 4c40ccbe7119.

>   o qsub -r n ⇒ sbatch --no-requeue

Added commit 4e88a2c010f.

> 
> o For -l h_vmem, --mem or --mem-per-cpu are both appropriate, based on how
> the SGE is configured.  Ours is configured for something equivalent to
> --mem-per-cpu.
>   o I'm not sure what should be in the release here.  I think --mem might be
> SGE's default.

As you probably know we currently set it to --mem.  How would you like us to tell to do it differently?  I can hard code it to mem-per-cpu instead though, just let me know if that is what you would like me to do.

> 
> o Don’t want warning output for -j and -cwd.

Removed in commit 6f909aea4d and 130d782ff4.

Let me know if there is anything else you need/find.

Comment 12 Danny Auble 2016-06-02 07:00:37 MDT

Keith, is there anything else you need with this?

Comment 13 Keith Bettinger 2016-06-03 10:04:05 MDT

A couple more things:

In the default -e and -o arguments, a ‘basename’ should be run on the script name to get the filename used.
  qsub /usr/bin/perl
  now results in 
  sbatch -e /usr/bin/perl.e%j -o /usr/bin/perl.o%j /usr/bin/perl

Needless attempts to convert HH:MM:SS to minutes are still being made for "qsub -l h_rt".  "sbatch -t" should be able to accept HH:MM:SS times, no?

Everything else is looking good.

Comment 14 Danny Auble 2016-06-06 07:45:47 MDT

(In reply to Keith Bettinger from comment #13)
> A couple more things:
> 
> In the default -e and -o arguments, a ‘basename’ should be run on the script
> name to get the filename used.
>   qsub /usr/bin/perl
>   now results in 
>   sbatch -e /usr/bin/perl.e%j -o /usr/bin/perl.o%j /usr/bin/perl

Thanks for pointing that out, this is fixed in commit 0d451320 which will be in 16.05.1.

> 
> Needless attempts to convert HH:MM:SS to minutes are still being made for
> "qsub -l h_rt".  "sbatch -t" should be able to accept HH:MM:SS times, no?

This logic possibly could be removed, but I know the torque qsub equivalent of h_rt accepts slight different formats.  It was decided to always go down and send in minutes instead of handling multiple formats.  I don't believe this conversion adds much to the process, as sbatch proper would do similar logic internally otherwise.  I am in favor of keeping it this way if that is ok. 

> 
> Everything else is looking good.

Excellent, please let me know if you find anything else, otherwise let me know and I will close the bug.

Comment 15 Danny Auble 2016-06-20 12:36:02 MDT

Hey Keith, anything else on this?

Comment 16 Keith Bettinger 2016-06-21 07:59:11 MDT

OK, all the changes that you've committed look like they are working.  Thanks for those.

I see your reasoning for the time parameters (minutes vs HH:MM:SS) and agree that minutes make the most sense.

I've run my behavior tests, and have the following feedback:

For the array job argument (-t n-m), the output filename should look like "job_name.o%j.%a" and the error filename should look like "job_name.e%j.%a", I believe.

For the binary executable argument (-b y), the name of the job was "wrap" when it should be the name of the executable in the absence of the -N argument.

For -l h_vmem=<MB>, while the sbatch command looks fine (--mem=<MB>), when I run the following script as the job:

#!/bin/bash
ulimit -a | fgrep "virtual memory"

the result is this for any value given:

virtual memory          (kbytes, -v) unlimited

This same script run under SGE correctly produces the value given in its h_vmem argument.

I do note that the Slurm configuration I am running within has the following memory defaults:

DefMemPerNode           = UNLIMITED
MaxMemPerNode           = UNLIMITED

I don't know if this matters.

Keith

Comment 17 Danny Auble 2016-06-29 11:53:53 MDT

(In reply to Keith Bettinger from comment #16)
> OK, all the changes that you've committed look like they are working. 
> Thanks for those.
> 
> I see your reasoning for the time parameters (minutes vs HH:MM:SS) and agree
> that minutes make the most sense.

Cool, thanks for the feedback

> 
> I've run my behavior tests, and have the following feedback:
> 
> For the array job argument (-t n-m), the output filename should look like
> "job_name.o%j.%a" and the error filename should look like "job_name.e%j.%a",
> I believe.

Thanks for pointing this out.  I verified with the man page this is indeed the correct format.  This is in commit 7baa5f10d.

> 
> For the binary executable argument (-b y), the name of the job was "wrap"
> when it should be the name of the executable in the absence of the -N
> argument.

Good catch, this is fixed in commit e24b8fea5.

> 
> For -l h_vmem=<MB>, while the sbatch command looks fine (--mem=<MB>), when I
> run the following script as the job:
> 
> #!/bin/bash
> ulimit -a | fgrep "virtual memory"
> 
> the result is this for any value given:
> 
> virtual memory          (kbytes, -v) unlimited
> 
> This same script run under SGE correctly produces the value given in its
> h_vmem argument.

Slurm doesn't adjust those limits rather is limiting them inside either the cgroup created for the job when using the task cgroup plugin.  Whether or not using cgroups or not (recommended to use them) the accounting_gather plugin is used to monitor the memory usage of a job and kill it if the job exceeds the limit.

> 
> I do note that the Slurm configuration I am running within has the following
> memory defaults:
> 
> DefMemPerNode           = UNLIMITED
> MaxMemPerNode           = UNLIMITED
> 
> I don't know if this matters.

That isn't relevant to this.  I believe this is just a different way of doing things.  Please let me know if you see anything else.

> 
> Keith

Comment 18 Keith Bettinger 2016-06-29 12:01:35 MDT

I am out of the office from Jun 28th through July 10th.  I will check my
email only sporadically, so I cannot guarantee a typical response time.  I
will be back in the office on 7/11/2016.

If you have an urgent matter concerning the SCG Cluster or other GBSC
services, you can contact the entire GBSC Informatics/IT team at
scg-action@lists.stanford.edu, who should be able to help you.

Keith

Comment 19 Danny Auble 2016-07-14 18:21:57 MDT

Keith, please let me know if there is anything else on this.

Thanks

Comment 20 Keith Bettinger 2016-07-20 15:41:34 MDT

Hi Danny,

Feedback on your recent fixes:

"-t": While I got the sbatch command line I expected, it behaved differently than I thought it would.

I ran "qsub.pl -t 1-5 pwd.sh" and only got output files "pwd.sh.{o,e}<JID>.5".  Shouldn't I have also gotten files ending in .1, .2, .3, and .4?

"-b y": Works as expected.  Thanks.

As for "-l h_vmem=<MB>", is there any way I can test the limits that the process runs within, or should I just trust the command line arguments, which look fine?

Keith

Comment 21 Danny Auble 2016-07-20 17:06:46 MDT

(In reply to Keith Bettinger from comment #20)
> Hi Danny,
> 
> Feedback on your recent fixes:
> 
> "-t": While I got the sbatch command line I expected, it behaved differently
> than I thought it would.
> 
> I ran "qsub.pl -t 1-5 pwd.sh" and only got output files
> "pwd.sh.{o,e}<JID>.5".  Shouldn't I have also gotten files ending in .1, .2,
> .3, and .4?

I would expect you to get them all.  I'll check this out and let you know what I find.

> 
> "-b y": Works as expected.  Thanks.

Awesome.

> 
> As for "-l h_vmem=<MB>", is there any way I can test the limits that the
> process runs within, or should I just trust the command line arguments,
> which look fine?

A couple of cosmetic ways to test it is run the command and then

scontrol show job and see if the memory limit is set correctly.

sacct would also report it given the correct field.

sacct --format=jobid,reqtres,alloctres

If you want to test it live you can create an application that just allocates memory and then see if it dies when it runs up against the memory limit.

You have to have account_gather running for this to work.  Having the proctrack/cgroup plugin is recommended as well.

Let me know how it goes.

> 
> Keith

Comment 22 Danny Auble 2016-07-21 07:33:55 MDT

Keith, in my testing I get all the expected files.  I am not sure why you are not getting them.  I have tried many different ways but all the files show up as expected.

I do notice they all get the slurm job id's instead of the master job id.

On a simple script this is what I get...

test.sh.e5280.5
test.sh.e5281.1
test.sh.e5282.2
test.sh.e5283.3
test.sh.e5284.4
test.sh.o5280.5
test.sh.o5281.1
test.sh.o5282.2
test.sh.o5283.3
test.sh.o5284.4

Would you rather the master id (5280) instead like sacct reports?

sacct -j5280
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
5280_5          test.sh      debug       none          1  COMPLETED      0:0 
5280_5.batch      batch                  none          1  COMPLETED      0:0 
5280_5.0       hostname                  none          1  COMPLETED      0:0 
5280_1          test.sh      debug       none          1  COMPLETED      0:0 
5280_1.batch      batch                  none          1  COMPLETED      0:0 
5280_1.0       hostname                  none          1  COMPLETED      0:0 
5280_2          test.sh      debug       none          1  COMPLETED      0:0 
5280_2.batch      batch                  none          1  COMPLETED      0:0 
5280_2.0       hostname                  none          1  COMPLETED      0:0 
5280_3          test.sh      debug       none          1  COMPLETED      0:0 
5280_3.batch      batch                  none          1  COMPLETED      0:0 
5280_3.0       hostname                  none          1  COMPLETED      0:0 
5280_4          test.sh      debug       none          1  COMPLETED      0:0 
5280_4.batch      batch                  none          1  COMPLETED      0:0 
5280_4.0       hostname                  none          1  COMPLETED      0:0

Comment 23 Keith Bettinger 2016-07-21 14:57:50 MDT

Hi Danny, I think I was confused by the multiple job IDs which were produced.  I do get the same set of files which you showed.

Yes, if my confusion is any indication of how a novice user might react, I think that the job IDs in the output/error files should probably be of the master job (the one returned by qsub.pl/sbatch) and not of the actual Slurm job IDs for the individual tasks, like you suggested.

Also, for -l h_vmem, I looked at jobs with "scontrol show job" and "sacct --format=jobid,reqmem" (reqtres and alloctres didn't show much) and saw what I would have expected.  I think I can trust that if Slurm is asked for a particular memory size, it can reasonably control that the job gets it.

Thanks again for working with me on these issues.
Keith

Comment 24 Danny Auble 2016-07-21 17:26:54 MDT

Keith I agree with you with the naming.  It makes much more sense to use the master job id.  Commit 8e008533b49 will fix the file naming.

Sorry for all the back and forth on this.  I am glad it is looking more like what you expect though.

Please let me know if there is anything else you would like or not.

Comment 25 Danny Auble 2016-07-25 10:36:16 MDT

Keith, let me know if there is anything left on this.  We plan to tag 16.05.3 tomorrow with this complete.

Comment 26 Keith Bettinger 2016-07-25 13:28:30 MDT

Yes, with the array job default file naming in place, I am ready to call this wrapper complete.  Thanks, Danny, for all the tweaks.

Comment 27 Danny Auble 2016-07-25 13:33:49 MDT

No problem Keith.  Please reopen if you do find anything else.

Comment 28 Locca 2019-11-03 21:13:46 MST

Dear slurm community, 
  Is there any plan to add SGE qhost slurm wrapper? Thanks.

Best Regards,
Locca