-
Language:
English
-
Language:
English
Red Hat Training
A Red Hat training course is available for Red Hat Ceph Storage
Chapter 6. Create a Cluster
If at any point you run into trouble and you want to start over, execute the following to purge the configuration:
ceph-deploy purgedata <ceph-node> [<ceph-node>] ceph-deploy forgetkeys
To purge the Ceph packages too, you may also execute:
ceph-deploy purge <ceph-node> [<ceph-node>]
If you execute purge
, you must re-install Ceph.
On your Calamari admin node from the directory you created for holding your configuration details, perform the following steps using ceph-deploy
.
Create the cluster. :
ceph-deploy new <initial-monitor-node(s)>
For example:
ceph-deploy new node1
Check the output of
ceph-deploy
withls
andcat
in the current directory. You should see a Ceph configuration file, a monitor secret keyring, and a log file of theceph-deploy
procedures.At this stage, you may begin editing your Ceph configuration file.
NoteIf you choose not to use
ceph-deploy
you will have to deploy Ceph manually or refer to Ceph manual deployment documentation and configure a deployment tool (e.g., Chef, Juju, Puppet, etc.) to perform each operationceph-deploy
performs for you.Add the
public_network
andcluster_network
settings under the[global]
section of your Ceph configuration file.public_network = <ip-address>/<netmask> cluster_network = <ip-address>/<netmask>
These settings distinguish which network is public (front-side) and which network is for the cluster (back-side). Ensure that your nodes have interfaces configured for these networks. We do not recommend using the same NIC for the public and cluster networks.
Turn on IPv6 if you intend to use it.
ms_bind_ipv6 = true
Add or adjust the
osd journal size
setting under the[global]
section of your Ceph configuration file.osd_journal_size = 10000
We recommend a general setting of 10GB. Ceph’s default
osd_journal_size
is0
, so you will need to set this in yourceph.conf
file. A journal size should find the product of thefilestore_max_sync_interval
and the expected throughput, and multiply the product by two (2). The expected throughput number should include the expected disk throughput (i.e., sustained data transfer rate), and network throughput. For example, a 7200 RPM disk will likely have approximately 100 MB/s. Taking themin()
of the disk and network throughput should provide a reasonable expected throughput.Set the number of copies to store (default is
3
) and the default minimum required write data when in adegraded
state (default is2
) under the[global]
section of your Ceph configuration file. We recommend the default values for production clusters.osd_pool_default_size = 3 osd_pool_default_min_size = 2
For a quick start, you may wish to set
osd_pool_default_size
to2
, and theosd_pool_default_min_size
to 1 so that you can achieve andactive+clean
state with only two OSDs.These settings establish the networking bandwidth requirements for the cluster network, and the ability to write data with eventual consistency (i.e., you can write data to a cluster in a degraded state if it has
min_size
copies of the data already).Set the maximum number of placement groups per OSD. The Ceph Storage Cluster has a default maximum value of 300 placement groups per OSD. You can set a different maximum value in your Ceph configuration file (i.e., where
n
is the maximum number of PGs per OSD).mon_pg_warn_max_per_osd = n
Multiple pools can use the same CRUSH ruleset. When an OSD has too many placement groups associated to it, Ceph performance may degrade due to resource use and load. This setting warns you, but you may adjust it to your needs and the capabilities of your hardware.
Set a CRUSH leaf type to the largest serviceable failure domain for your replicas under the
[global]
section of your Ceph configuration file. The default value is1
, or host, which means that CRUSH will map replicas to OSDs on separate separate hosts. For example, if you want to make three object replicas, and you have three racks of chassis/hosts, you can setosd_crush_chooseleaf_type
to3
, and CRUSH will place each copy of an object on OSDs in different racks. For example:osd_crush_chooseleaf_type = 3
The default CRUSH hierarchy types are:
- type 0 osd
- type 1 host
- type 2 chassis
- type 3 rack
- type 4 row
- type 5 pdu
- type 6 pod
- type 7 room
- type 8 datacenter
- type 9 region
- type 10 root
Set
max_open_files
so that Ceph will set the maximum open file descriptors at the OS level to help prevent Ceph OSD Daemons from running out of file descriptors.max_open_files = 131072
In summary, your initial Ceph configuration file should have at least the following settings with appropriate values assigned after the =
sign:
[global] fsid = <cluster-id> mon_initial_members = <hostname>[, <hostname>] mon_host = <ip-address>[, <ip-address>] public_network = <network>[, <network>] cluster_network = <network>[, <network>] ms_bind_ipv6 = [true | false] max_open_files = 131072 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd_journal_size = <n> filestore_xattr_use_omap = true osd_pool_default_size = <n> # Write an object n times. osd_pool_default_min_size = <n> # Allow writing n copy in a degraded state. osd_crush_chooseleaf_type = <n>