Ceph commands exhaust all available sockets on a Ceph cluster with high number of OSDs, why?

Solution In Progress - Updated -

Issue

  • Executing a Ceph command consumes all available file handles on the machine where the command is being executed.

  • The logs show up as:

2015-02-24 04:43:22.277355 7f981e6e6700  0 -- [2607:f298:4:2243::8008]:0/1031296 >> [2607:f298:4:2243::6336]:6810/27102 pipe(0x7f991838cfc0 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9918329b30).fault
2015-02-24 04:43:22.277370 7f981e6e6700 -1 -- [2607:f298:4:2243::8008]:0/1031296 >> [2607:f298:4:2243::6336]:6810/27102 pipe(0x7f991838cfc0 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9918329b30).connect couldn't created socket (24) Too many open files
  • Almost any of the Ceph commands that tries to change a configuration like 'ceph tell' can trigger this. For example, the following commands triggered the problem.
# ceph tell 'osd.*' injectargs '--osd_max_backfills 2'

# rados -p .log cleanup --prefix 2012
  • In all observed cases, the command doesn't fail to create a socket and then die; instead it tries to open a socket ad infinitum.

  • Such a behavior seemed more useful at the time the tools were written, but since it is the tools themselves (apparently) using up all the file descriptors for sockets and never closing them once work is done, there will never be any sockets available. The commands will spew errors until manually halted.

  • The sockets should be closed once the requirement is over, rather than holding it open.

Environment

  • Inktank Ceph Enterprise 1.2

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In