Bug 14954

Summary: cvmfs issues with job containers
Product: Slurm Reporter: Paul Edmon <pedmon>
Component: slurmdAssignee: Tim McMullan <mcmullan>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 22.05.3   
Hardware: Linux   
OS: Linux   
Site: Harvard University Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Paul Edmon 2022-09-13 07:30:02 MDT
We recently upgraded to 22.05.3 as well as enabling the Job Container plugin.  After the upgrade and enabling the plugin we noticed jobs were not completing cleanly and required the reboot of the node.  We also noticed that cvmfs was not operating properly.  After some digging we found this:

https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels-of-symbolic-links/156/5

It looks like there is a conflict between autofs and cvmfs (which we use) and the job container plugin.  We are going to turn off the job container plugin, but we'd like to have it on if possible.  Thus I'm raising this as a bug so that you are aware of the issue.  It would be good to not impact user space or dynamic mounting when using the job containers.
Comment 1 Tim McMullan 2022-09-13 09:19:31 MDT
(In reply to Paul Edmon from comment #0)
> We recently upgraded to 22.05.3 as well as enabling the Job Container
> plugin.  After the upgrade and enabling the plugin we noticed jobs were not
> completing cleanly and required the reboot of the node.  We also noticed
> that cvmfs was not operating properly.  After some digging we found this:
> 
> https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels-
> of-symbolic-links/156/5
> 
> It looks like there is a conflict between autofs and cvmfs (which we use)
> and the job container plugin.  We are going to turn off the job container
> plugin, but we'd like to have it on if possible.  Thus I'm raising this as a
> bug so that you are aware of the issue.  It would be good to not impact user
> space or dynamic mounting when using the job containers.

Hi Paul!

I'm not certain what role cvmfs would play here, but we are working on the autofs issue in bug12567.  I think based on the issue you linked cvmfs isn't playing a role here and its all the issue with autofs + job_container/tmpfs.  Would you agree with that assessment after looking at the other bug?

Thanks!
--Tim
Comment 2 Paul Edmon 2022-09-13 09:21:05 MDT
Yeah, I agree.  The cvmfs issue at root is really an autofs issue as 
cvmfs uses autofs under the hood.  I would merge this into that ticket.

-Paul Edmon-

On 9/13/2022 11:19 AM, bugs@schedmd.com wrote:
>
> *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=14954#c1> on 
> bug 14954 <https://bugs.schedmd.com/show_bug.cgi?id=14954> from Tim 
> McMullan <mailto:mcmullan@schedmd.com> *
> (In reply to Paul Edmon fromcomment #0  <show_bug.cgi?id=14954#c0>)
> > We recently upgraded to 22.05.3 as well as enabling the Job Container > plugin. After the upgrade and enabling the plugin we noticed jobs 
> were not > completing cleanly and required the reboot of the node. We 
> also noticed > that cvmfs was not operating properly. After some 
> digging we found this: > > 
> https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels- 
> > of-symbolic-links/156/5 > > It looks like there is a conflict 
> between autofs and cvmfs (which we use) > and the job container 
> plugin. We are going to turn off the job container > plugin, but we'd 
> like to have it on if possible. Thus I'm raising this as a > bug so 
> that you are aware of the issue. It would be good to not impact user > 
> space or dynamic mounting when using the job containers.
>
> Hi Paul!
>
> I'm not certain what role cvmfs would play here, but we are working on the
> autofs issue inbug12567  <show_bug.cgi?id=12567>.  I think based on the issue you linked cvmfs isn't
> playing a role here and its all the issue with autofs + job_container/tmpfs.
> Would you agree with that assessment after looking at the other bug?
>
> Thanks!
> --Tim
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>
Comment 3 Tim McMullan 2022-09-13 09:23:13 MDT
(In reply to Paul Edmon from comment #2)
> Yeah, I agree.  The cvmfs issue at root is really an autofs issue as 
> cvmfs uses autofs under the hood.  I would merge this into that ticket.
> 
> -Paul Edmon-

Sounds good, Thanks Paul!  I'll merge the tickets now.

*** This bug has been marked as a duplicate of bug 12567 ***