Ceph- correct OSD maintenance procedure

Solution Unverified - Updated -

Issue

We recently did maintenance on one of our OSD hosts (serving RBDs to OpenStack KVM client instances) and had corresponding end-user reports of NFS interruptions, The NFS server in question is serving data from an RBD volume and sees messages like:

INFO: task jbd2/vdc2-8:938 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

in dmesg.

We did the usual "ceph osd set noout" during the maintenance window, but are wondering if there are other tuneables we should be looking at to avoid client hiccups like this.

Is it necessary to gracefully/explicitly take the OSDs "down" rather than simply rebooting the system, we assumed the OSDs would shutdown gracefully and inform the mons they are now down?

Are there any librbd options that determine timeouts for before the client moves onto the next OSD in the map and/or gives up on existing connections?

Environment

  • Inktank Ceph Enterprise 1.2

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.