Issue with ceph OSD during overcloud deployment

Solution In Progress - Updated -

Issue

  • During deployment, we're get this error:
       "ok: [overcloud-compute-hci-8] => (item=/var/log/ceph) => {\"ansible_loop_var\": \"item\", \"changed\": false, \"gid\": 167, \"group\": \"167\", \"item\": \"/var/log/ceph\", \"mode\": \"0755\", \"owner\": \"167\", \"path\": \"/v
ar/log/ceph\", \"secontext\": \"system_u:object_r:container_file_t:s0\", \"size\": 29, \"state\": \"directory\", \"uid\": 167, \"warnings\": [\"The value 167 (type int) in a string field was converted to '167' (type string). If this doe
s not look like what you expect, quote the entire value to ensure it does not change.\", \"The value 167 (type int) in a string field was converted to '167' (type string). If this does not look like what you expect, quote the entire val
ue to ensure it does not change.\"]}",
        "ok: [overcloud-compute-hci-9] => (item=/var/log/ceph) => {\"ansible_loop_var\": \"item\", \"changed\": false, \"gid\": 167, \"group\": \"167\", \"item\": \"/var/log/ceph\", \"mode\": \"0755\", \"owner\": \"167\", \"path\": \"/v
ar/log/ceph\", \"secontext\": \"system_u:object_r:container_file_t:s0\", \"size\": 29, \"state\": \"directory\", \"uid\": 167, \"warnings\": [\"The value 167 (type int) in a string field was converted to '167' (type string). If this doe
s not look like what you expect, quote the entire value to ensure it does not change.\", \"The value 167 (type int) in a string field was converted to '167' (type string). If this does not look like what you expect, quote the entire val
ue to ensure it does not change.\"]}",
        "Friday 11 September 2020  21:55:14 +0200 (0:00:02.719)       0:03:19.309 ****** ",
        "Friday 11 September 2020  21:55:14 +0200 (0:00:00.352)       0:03:19.662 ****** ",
        "[WARNING]: The value False (type bool) in a string field was converted to",
        "'False' (type string). If this does not look like what you expect, quote the",
        "entire value to ensure it does not change.",
        "fatal: [overcloud-compute-hci-0]: FAILED! => {\"changed\": true, \"cmd\": [\"podman\", \"run\", \"--rm\", \"--privileged\", \"--net=host\", \"--ipc=host\", \"--ulimit\", \"nofile=1024:4096\", \"-v\", \"/run/lock/lvm:/run/lock/l
vm:z\", \"-v\", \"/var/run/udev/:/var/run/udev/:z\", \"-v\", \"/dev:/dev\", \"-v\", \"/etc/ceph:/etc/ceph:z\", \"-v\", \"/run/lvm/:/run/lvm/\", \"-v\", \"/var/lib/ceph/:/var/lib/ceph/:z\", \"-v\", \"/var/log/ceph/:/var/log/ceph/:z\", \"
--entrypoint=ceph-volume\", \"10.10.10.10:8787/rhceph/rhceph-4-rhel8:4-32\", \"--cluster\", \"ceph\", \"inventory\", \"--format=json\"], \"delta\": \"0:00:01.592717\", \"end\": \"2020-09-11 21:55:16.871681\", \"msg\": \"non-zero return 
code\", \"rc\": 1, \"start\": \"2020-09-11 21:55:15.278964\", \"stderr\": \"WARNING: The same type, major and minor should not be used for multiple devices.\\n-->  KeyError: 'ceph.cluster_name'\", \"stderr_lines\": [\"WARNING: The same 
type, major and minor should not be used for multiple devices.\", \"-->  KeyError: 'ceph.cluster_name'\"], \"stdout\": \"\", \"stdout_lines\": []}",
        "fatal: [overcloud-compute-hci-10]: FAILED! => {\"changed\": true, \"cmd\": [\"podman\", \"run\", \"--rm\", \"--privileged\", \"--net=host\", \"--ipc=host\", \"--ulimit\", \"nofile=1024:4096\", \"-v\", \"/run/lock/lvm:/run/lock/
lvm:z\", \"-v\", \"/var/run/udev/:/var/run/udev/:z\", \"-v\", \"/dev:/dev\", \"-v\", \"/etc/ceph:/etc/ceph:z\", \"-v\", \"/run/lvm/:/run/lvm/\", \"-v\", \"/var/lib/ceph/:/var/lib/ceph/:z\", \"-v\", \"/var/log/ceph/:/var/log/ceph/:z\", \
"--entrypoint=ceph-volume\", \"10.10.10.10:8787/rhceph/rhceph-4-rhel8:4-32\", \"--cluster\", \"ceph\", \"inventory\", \"--format=json\"], \"delta\": \"0:00:01.600835\", \"end\": \"2020-09-11 21:55:16.883596\", \"msg\": \"non-zero return
 code\", \"rc\": 1, \"start\": \"2020-09-11 21:55:15.282761\", \"stderr\": \"WARNING: The same type, major and minor should not be used for multiple devices.\\n-->  KeyError: 'ceph.cluster_name'\", \"stderr_lines\": [\"WARNING: The same
 type, major and minor should not be used for multiple devices.\", \"-->  KeyError: 'ceph.cluster_name'\"], \"stdout\": \"\", \"stdout_lines\": []}",

[...]

vm:z\", \"-v\", \"/var/run/udev/:/var/run/udev/:z\", \"-v\", \"/dev:/dev\", \"-v\", \"/etc/ceph:/etc/ceph:z\", \"-v\", \"/run/lvm/:/run/lvm/\", \"-v\", \"/var/lib/ceph/:/var/lib/ceph/:z\", \"-v\", \"/var/log/ceph/:/var/log/ceph[47/1808]
--entrypoint=ceph-volume\", \"10.10.10.10:8787/rhceph/rhceph-4-rhel8:4-32\", \"--cluster\", \"ceph\", \"inventory\", \"--format=json\"], \"delta\": \"0:00:01.625044\", \"end\": \"2020-09-11 21:55:17.079106\", \"msg\": \"non-zero return 
code\", \"rc\": 1, \"start\": \"2020-09-11 21:55:15.454062\", \"stderr\": \"WARNING: The same type, major and minor should not be used for multiple devices.\\n-->  KeyError: 'ceph.cluster_name'\", \"stderr_lines\": [\"WARNING: The same 
type, major and minor should not be used for multiple devices.\", \"-->  KeyError: 'ceph.cluster_name'\"], \"stdout\": \"\", \"stdout_lines\": []}",
        "NO MORE HOSTS LEFT *************************************************************",
        "PLAY RECAP *********************************************************************",
        "overcloud-compute-hci-0    : ok=101  changed=3    unreachable=0    failed=1    skipped=192  rescued=0    ignored=0   ",
        "overcloud-compute-hci-1    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-10   : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-11   : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-2    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-3    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-4    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-5    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-6    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-7    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-8    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-hci-9    : ok=94   changed=2    unreachable=0    failed=1    skipped=187  rescued=0    ignored=0   ",
        "overcloud-compute-sriov-0  : ok=48   changed=2    unreachable=0    failed=0    skipped=141  rescued=0    ignored=0   ",
        "overcloud-compute-sriov-1  : ok=48   changed=2    unreachable=0    failed=0    skipped=141  rescued=0    ignored=0   ",
        "overcloud-compute-sriov-2  : ok=48   changed=2    unreachable=0    failed=0    skipped=141  rescued=0    ignored=0   ",
        "overcloud-controller-0     : ok=192  changed=9    unreachable=0    failed=0    skipped=269  rescued=0    ignored=0   ",
        "INSTALLER STATUS ***************************************************************",
        "Install Ceph Monitor           : Complete (0:00:16)",
        "Install Ceph Manager           : Complete (0:00:14)",
        "Install Ceph OSD               : In Progress (0:00:41)",
        "\tThis phase can be restarted by running: roles/ceph-osd/tasks/main.yml",
        "Friday 11 September 2020  21:55:17 +0200 (0:00:02.123)       0:03:21.785 ****** ",
        "=============================================================================== ",
        "ceph-config : create ceph initial directories --------------------------- 2.72s",
        "gather facts ------------------------------------------------------------ 2.67s",
        "gather and delegate facts ----------------------------------------------- 2.42s",
        "ceph-config : create ceph initial directories --------------------------- 2.32s",
        "ceph-config : create ceph initial directories --------------------------- 2.29s",
        "ceph-config : look up for ceph-volume rejected devices ------------------ 2.12s",
        "ceph-container-common : pulling 10.10.10.10:8787/rhceph/rhceph-4-rhel8:4-32 image --- 1.54s",
        "ceph-infra : install chrony --------------------------------------------- 1.41s",
        "ceph-facts : get default crush rule value from ceph configuration ------- 1.36s",
        "check for python -------------------------------------------------------- 1.33s",
        "ceph-container-common : get ceph version -------------------------------- 1.26s",
        "ceph-facts : get default crush rule value from ceph configuration ------- 1.24s",
        "ceph-container-common : include release.yml ----------------------------- 1.21s",
        "ceph-infra : enable chronyd --------------------------------------------- 1.18s",
        "ceph-container-common : include fetch_image.yml ------------------------- 1.08s",
        "ceph-facts : include facts.yml ------------------------------------------ 1.03s",
        "ceph-infra : include_tasks setup_ntp.yml -------------------------------- 1.01s",
        "ceph-handler : include check_running_containers.yml --------------------- 1.01s",
        "ceph-validate : include check_system.yml -------------------------------- 1.01s",
        "ceph-container-common : include prerequisites.yml ----------------------- 1.01s"
    ],
    "failed_when_result": true
  • The CephAnsibleDisksConfig we are using is the following:
  CephAnsibleDisksConfig:
    devices:
      - /dev/sdb
    # No dedicated SSD/NVME drive
    osd_scenario: collocated
    journal_size: 8192

  # https://bugs.launchpad.net/tripleo/+bug/1749544
  CephPoolDefaultSize: 3
  CephPoolDefaultPgNum: 32

  CephPools:
     - name: volumes
       pg_num: 256
       pgp_num: 256

  # Additiona Ceph Configs
  CephConfigOverrides:
    # Max PG Number per OSD
    mon_max_pg_per_osd: 2048
    # Increase max filelimit
    max_open_files: 131072

Environment

  • Red Hat OpenStack Platform 16.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In