ping and ibnetdiscover not working with realtime kernel after opensm server is restarted
Issue
We experienced some issue with our Infiniband infrastructure after the restart of our primary opensm
server:
- Some servers were not able to
ping
each other via IPoIB network while theibping
command worked fine. - The same servers were not able to execute
ibnetdiscover
command because it hanged. - The problem was not extended to all servers and we solved the issue via reboot of the affected nodes.
- sometimes while switching between different opensm masters a random opensm client crashes and dumps a vmcore
Environment
- Red Hat Enterprise Linux 6
- kernel-3.0.36-rt57.66.el6rt.x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.