Ticket 7218 - ctl character in sacct output
Summary: ctl character in sacct output
Status: RESOLVED DUPLICATE of ticket 2265
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 18.08.5
Hardware: Linux Linux
: --- 4 - Minor Issue
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-06-10 13:04 MDT by Jenny Williams
Modified: 2019-08-22 10:31 MDT (History)
0 users

See Also:
Site: University of North Carolina at Chapel Hill
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jenny Williams 2019-06-10 13:04:44 MDT
When running sacct we are retrieving output that includes "^M" in the job name output.  The expected output is to not have control characters in job names.  Is this something people have encountered? How do they handle that?
Is it the case that jobs like this should be rejected?  Or is it possible this error crept in when the job information was stored?  The events we have found cluster around a particular user so at the moment we believe a user was able to submit the job in this way.

An example:

# sacct -S 2017-08-01T11:00:00 -E 2017-08-01T11:50:00 -a -j 8904818 -P --format=jobname
JobName
.sbatchWING_ChIPSeq
batch



 # sacct -S 2017-08-01T11:00:00 -E 2017-08-01T11:50:00 -a -j 8904818 -P --format=Account,UID,User,JobIDRaw,JobName > /tmp/foo
# cat /tmp/foo
Account|UID|User|JobIDRaw|JobName
.sbatchay1_pi|237782|cuyehara|8904818|H3K9ac_WING_ChIPSeq
rc_dmckay1_pi|||8904818.batch|batch


# less /tmp/foo
Account|UID|User|JobIDRaw|JobName
rc_dmckay1_pi|237782|cuyehara|8904818|H3K9ac_WING_ChIPSeq^M.sbatch
rc_dmckay1_pi|||8904818.batch|batch
Comment 1 Jason Booth 2019-06-10 15:03:25 MDT
Hi Jenny - This is rather odd to see this "^M" in a filename/jobname. Can you tell us a bit more about where this user is submitting from and how that character ends up as part of the job? For example, are they submitting from a tool on a Windows or MAC system or creating this job script in Windows/OSX?

If you want to parse the output now you can use something like sed to remove these.

sacct -a -j 17328 -P --format=Account,UID,User,JobIDRaw,JobName |  sed  "s/\r//g" 


> Is this something people have encountered? How do they handle that?

This is not something we encounter often however it can be easily remedied by educating the user who submits these files. You may also be able to use the job_submit plugin to catch this but this is something that would mask the user issue and would encourage bad practices.
Comment 2 Jenny Williams 2019-06-10 15:14:42 MDT
Here is another example. 

This is accounting records for one use since january this year.  The fields where it shows up is in the jobname - in this case the names that throw the ^M are either in the row of the accounting record for the job that does not have the jobname listed or sometimes "bash" - there is an instance where ^M is in the field name , but the user has jobs named "bash" frequently, and this is infrequent.

The instances tend to cluster together in the output. when the output file is deleted and generated again it is the same entries that have the ^M.

# less on the output file:

21694595|wrap|
21694595.batch|batch|
21694595.extern|extern|
21694596|wrap|
21694596.batch|batch|
21694596.extern|extern|
21694598|wrap|
21694598.batch|batch|
21694598.extern|extern|
21694599|wrap|
21694599.batch|batch|
21694599.extern|extern|
22088985|bash|
22088985.extern|extern|
22088985.0|bash|
23265006|bash|
23265006.extern|extern|
23265006.0|bash|
23812556|bash^M|
23812556.extern|extern|
23812556.0|bash^M|
23812687|bash^M|
23812687.extern|extern|
23812687.0|bash^M|
23812781|bash|
23812781.extern|extern|
23812781.0|bash|
23812938|bash|
23812938.extern|extern|
23812938.0|bash|
23812938.1|bash|
23812938.2|bash|

# What line numbers ?
# egrep -n -B2 -r $'\r' /tmp/foo
213356-23265006.extern|extern|
213357-23265006.0|bash|
|
213359-23812556.extern|extern|
|
|
213362-23812687.extern|extern|
|

# How often does this user use this particular command name?
# egrep -n bash /tmp/foo
1354:6291418|bash|
1355:6291448|bash|
1356:6292793|bash|
1358:6292793.0|bash|
1365:6304768|bash|
1367:6304768.0|bash|
2610:12558287|bash|
2612:12558287.0|bash|
2613:12560273|bash|
2615:12560273.0|bash|
2616:12570691|bash|
2618:12570691.0|bash|
2619:12574800|bash|
2621:12574800.0|bash|
2628:13328479|bash|
2630:13328479.0|bash|
2631:13328646|bash|
2633:13328646.0|bash|
2634:13329668|bash|
2636:13329668.0|bash|
3075:15103963|bash|
3077:15103963.0|bash|
6831:15166374|bash|
6832:15166426|bash|
6833:15262166|bash|
6834:15307116|bash|
6836:15307116.0|bash|
6837:15382629|bash|
6839:15382629.0|bash|
6840:15382675|bash|
6842:15382675.0|bash|
6879:15516220|bash|
6881:15516220.0|bash|
6882:15755142|bash|
6884:15755142.0|bash|
6885:15755154|bash|
6887:15755154.0|bash|
213352:22088985|bash|
213354:22088985.0|bash|
213355:23265006|bash|
213357:23265006.0|bash|
|13358:23812556|bash
|13360:23812556.0|bash
|13361:23812687|bash
|13363:23812687.0|bash
213364:23812781|bash|
213366:23812781.0|bash|
213367:23812938|bash|
213369:23812938.0|bash|
213370:23812938.1|bash|
213371:23812938.2|bash|
217769:24180737|bash|
217771:24180737.0|bash|
Comment 3 Jason Booth 2019-06-13 11:32:32 MDT
Hi Jenny - were you able to identify where the user is submitting from and what causes this? 

You mentioned:
>The instances tend to cluster together in the output. when the output file is deleted and generated again it is the same entries that have the ^M.

Can you be more specific, and can you put together a reproducer? 

This looks like either the user is doing something odd or the workflow is inserting the ^M when it should not be.
Comment 4 Jenny Williams 2019-06-13 11:44:02 MDT
Based on the occurrence pattern we've determined that it is not something the user is doing, but is introduced when the data is stored, independent of user.  The last example was showing this - the user is just opening a bash shell. The occurrence were a few clustered together just on this user within a small period of time.  The occurrences are via various users at times a few occurrences in sequence amongst many many submissions.  We have various examples.
Comment 5 Jason Booth 2019-06-14 10:56:19 MDT
Hi Jenny - I am having trouble reproducing this without forcing the ^M into a file with "ctlr+v+M". 

Would you please look at the commonalities for those jobs and see if they are all from the same group or if they submit a similar workflow? If so, please check how they are submitting the job and what they are doing to trigger this event.

>The instances tend to cluster together in the output. when the output file is deleted and generated again it is the same entries that have the ^M.

I am not following you on this comment. What do you mean by "when the output file is deleted and generated again"?


As mentioned previously it would be good to know if these job scripts were created or modified on Windows or Unix (MAC OS) before being submitted since the "^M" is not something Linux/Slurm would insert normally.

This could also be related to the ssh client your Windows users use as some do not properly clean up \r.

Thank you for the information you have sent so far, and I look forward to seeing your reproducer so that I can better understand the root cause.
Comment 6 Jenny Williams 2019-06-14 13:28:31 MDT
The programmer working on this accounting data, Karen Milberg, summarizes the conditions as follows:

 We have found a number of cases of unexpected characters in the job name field. Patterns/characteristics of this issue are:
* We are NOT currently able to reproduce it, but do see that it is an ongoing issue.
* Unexpected characters in the job name include: "^M" and single newline characters and sequences of multiple (up to 3) newline characters.
* The unexpected characters sometimes occur in the middle of the job name (i.e. there are regular characters after it), and sometimes at the end of the job name.
* The issue occurs for different users, in different groups, running different types of jobs (e.g. bash, R and others), and only for the occasional job, even for similar 
 or near identical submissions.

We don't see this pattern indicating this is due to user action.
Comment 8 Jason Booth 2019-06-17 15:04:32 MDT
Jenny - thank you for the feedback from Karen Milberg. We are tracking this issue via 2265 in which the solution would be a more holistic solution and not just one for ^M. If you are looking for another way to handle this in the meantime, then might I suggest doing so in the job_submit job_desc.name

For example, you could do some type of detection for the job name and fail if it contains undesired characters. 

function slurm_job_submit(job_desc, part_list, submit_uid)

    --use job_desc.name to find if there is anything undesired in the name

    return slurm.SUCCESS
end


Since we already have a bug open on this I am making this as a duplicate of 2265.

*** This ticket has been marked as a duplicate of ticket 2265 ***
Comment 9 Jenny Williams 2019-08-12 13:52:36 MDT
The bug i filed on this is not associated with user input -- the errors in sacct output occurs for instance over a few jobs over a job array, with jobs before and after showing no such errors.
Comment 10 Jason Booth 2019-08-22 10:31:25 MDT
Hi Jenny

> The bug i filed on this is not associated with user input -- the errors in sacct output occurs for instance over a few jobs over a job array, with jobs before and after showing no such errors.


We recognize this, and as mentioned in comment #8 the issue is not limited to just user inputs and job scripts. This problem has a wider scope that must be tackled holistically. Bug #2265 tries to make this clear by the title "sacct doesn't escape delimiter and other special characters".

Karen mentioned, "We are NOT currently able to reproduce it, but do see that it is an ongoing issue.". As far as I can determine these are not being introduced by Slurm. If you can provide a way to duplicate this then I can offer some more insight into this however this still leaves us in a situation of handling escape delimiter and other special characters in sacct and this is what bug #2265 is meant to handle. 

For now, I am marking this issue as a duplicate of that bug.

*** This ticket has been marked as a duplicate of ticket 2265 ***