2.3. 模拟磁盘失败

磁盘故障有两种:硬和软。硬故障意味着需要替换磁盘。软故障可能是设备驱动程序或其它一些软件组件的问题。

如果是软故障,可能不需要替换磁盘。如果替换磁盘,则需要遵循步骤来删除失败的磁盘,并将替换磁盘添加到 Ceph。为了模拟软磁盘失败,最好的操作是删除该设备。选择一个设备并从系统中删除设备。

先决条件

  • 一个正常运行的 Red Hat Ceph Storage 集群。
  • 对 Ceph OSD 节点的 root 级别访问权限。

流程

  1. sysfs 中删除块设备:

    语法

    echo 1 > /sys/block/BLOCK_DEVICE/device/delete

    示例

    [root@osd ~]# echo 1 > /sys/block/sdb/device/delete

    在 Ceph OSD 日志中,Ceph 会检测到故障,并且自动启动恢复过程。

    示例

    [root@osd ~]# tail -50 /var/log/ceph/ceph-osd.1.log
    2020-09-02 15:50:50.187067 7ff1ce9a8d80  1 bdev(0x563d263d4600 /var/lib/ceph/osd/ceph-2/block) close
    2020-09-02 15:50:50.440398 7ff1ce9a8d80 -1 osd.2 0 OSD:init: unable to mount object store
    2020-09-02 15:50:50.440416 7ff1ce9a8d80 -1 ^[[0;31m ** ERROR: osd init failed: (5) Input/output error^[[0m
    2020-09-02 15:51:10.633738 7f495c44bd80  0 set uid:gid to 167:167 (ceph:ceph)
    2020-09-02 15:51:10.633752 7f495c44bd80  0 ceph version 12.2.12-124.el7cp (e8948288b90d312c206301a9fcf80788fbc3b1f8) luminous (stable), process ceph-osd, pid 36209
    2020-09-02 15:51:10.634703 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
    2020-09-02 15:51:10.635749 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
    2020-09-02 15:51:10.636642 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
    2020-09-02 15:51:10.637535 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
    2020-09-02 15:51:10.641256 7f495c44bd80  0 pidfile_write: ignore empty --pid-file
    2020-09-02 15:51:10.669317 7f495c44bd80  0 load: jerasure load: lrc load: isa
    2020-09-02 15:51:10.669387 7f495c44bd80  1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
    2020-09-02 15:51:10.669395 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
    2020-09-02 15:51:10.669611 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational
    2020-09-02 15:51:10.670320 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
    2020-09-02 15:51:10.670328 7f495c44bd80  1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) close
    2020-09-02 15:51:10.924727 7f495c44bd80  1 bluestore(/var/lib/ceph/osd/ceph-2) _mount path /var/lib/ceph/osd/ceph-2
    2020-09-02 15:51:10.925582 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
    2020-09-02 15:51:10.925628 7f495c44bd80  1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
    2020-09-02 15:51:10.925630 7f495c44bd80  1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
    2020-09-02 15:51:10.925784 7f495c44bd80  1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational
    2020-09-02 15:51:10.926549 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error

  2. 查看 Ceph OSD 磁盘树,我们也会看到磁盘离线。

    示例

    [root@osd ~]# ceph osd tree
    ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
    -1 0.28976 root default
    -2 0.09659     host ceph3
     1 0.09659         osd.1       down 1.00000          1.00000
    -3 0.09659     host ceph1
     2 0.09659         osd.2       up  1.00000          1.00000
    -4 0.09659     host ceph2
     0 0.09659         osd.0       up  1.00000          1.00000