3.2.1. Data Durability

Posted on

Hello

I think there are some errors in the calculations...
cf. https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/storage_strategies_guide/placement_groups_pgs#data_durability
I propose a correction.

Quote:
[In the 20 OSD cluster each OSD only has to copy 50GB each. If the network was the bottleneck, recovery will happen twice as fast.]

Correction:
[In the 20 OSD cluster each OSD only has to copy 50GB each. If the host network or disk was the bottleneck, recovery will happen twice as fast.]
Note:
If the switch network speed was the bottleneck, recovery won't be faster.

Quote:
[In the 10 OSD cluster described above, if any OSD fails, then approximately 8 placement groups (i.e. 75 pgs / 9 osds being recovered) will only have one surviving copy. And if any of the 8 remaining OSDs fail, the last objects of one placement group are likely to be lost (i.e. 8 pgs / 8 osds with only one remaining copy being recovered). This is why starting with a somewhat larger cluster is preferred (e.g., 50 OSDs).

When the size of the cluster grows to 20 OSDs, the number of placement groups damaged by the loss of three OSDs drops. The second OSD lost will degrade approximately 2 (i.e. 35 pgs / 19 osds being recovered) instead of 8 and the third OSD lost will only lose data if it is one of the two OSDs containing the surviving copy. In other words, if the probability of losing one OSD is 0.0001% during the recovery time frame, it goes from 8 * 0.0001% in the cluster with 10 OSDs to 2 * 0.0001% in the cluster with 20 OSDs.]

Correction:
[In the 10 OSD cluster described above, if a second OSD fails, then approximately 34 placement groups (i.e. 150 pgs * 2 copies / 9 osds being recovered) will only have one surviving copy. And if any of the 8 remaining OSDs fail, the last objects of 5 placement groups are likely to be lost (i.e. 34 pgs * 1 copy / 8 osds with only one remaining copy being recovered). This is why starting with a somewhat larger cluster is preferred (e.g., 50 OSDs).

When the size of the cluster grows to 50 OSDs, the number of placement groups damaged by the loss of three OSDs drops. The second OSD lost will degrade approximately 2 pgs (i.e. 30 pgs * 2 copies / 49 osds being recovered) instead of 34 and the third OSD lost will only lose data if it is one of the two OSDs containing the surviving copy. In other words, if the probability of losing one OSD is 0.0001% during the recovery time frame, it goes from 34 * 0.0001% in the cluster with 10 OSDs to 2 * 0.0001% in the cluster with 50 OSDs.]

Best Regards
Francois Scheurer

Responses