About article "IP fragmentation fails..."

Latest response

The article "IP fragmentation fails and fragmented packets get dropped" at
https://access.redhat.com/solutions/1498603
is good. I have some questions:

In the Root Cause section:

(1) "A bug has been added to address this for RHEL6.6" means a bug fix has been added?
(2) If the global counter is updated less than the per-cpu counter, isn't the global counter lower instead of higher than what the correct value would be?
(3) How can the per-cpu counter be negative e.g. < -130k? Value overflow?
(4) Is it possible that more than doubling the the IP fragmentation thresholds is needed? Since the required memory is so small on modern servers, why not give, say, more than 10 MB?

Thanks a lot.

Responses

(1) "A bug has been added to address this for RHEL6.6" means a bug fix has been added?

The issue was logged against RHEL 6.6 and was fixed in 6.8 and 6.7.z as per the Errata listed in the Resolution section.

(4) Is it possible that more than doubling the the IP fragmentation thresholds is needed?

Yes.

For the rest of your questions, the person who did most of the work on this is on holiday at the moment, I'll send him an email to see if we're able to answer these in a week or two.

Hi, I am the Engineer that worked on this as Jamie mentioned.

(2) If the global counter is updated less than the per-cpu counter, isn't the global counter lower instead of higher than what the correct value would be?

The issue here is that the value from the per CPU counter has not been subtracted from the global value yet. Therefore the global value can be much higher than what it should be and exceed the frag thresholds, although there is no leak. Because there are many CPUs on some machines (a few hundred) , the value of these individual per-CPU counters may not have reached the threshold of -(130K) yet to trigger the global value update. Therefore the global value is much bigger than it should be at some point in time.

For example:

The global value is 4MB. There have 30K decrements on CPU0. There have 30K decrements on CPU1. Etc to say 190 CPUs. But these have not been accounted to the global value yet as the per-cpu values have not reached the threshold of 130K or -130K. Had there only been 16 CPUs this may not have been an issue as the per-cpu counter would be higher as there are less cpus, it would therefore exceed the threshold quicker and the global value would get updated before the frag threshold was breached.

The reason for this is that updating the global value is expensive and requires locking. So if the threshold was lowered it would impact IP fragmentation performance and defeat the purpose of having per-cpu counters in the 1st place.

(3) How can the per-cpu counter be negative e.g. < -130k? Value overflow?

If I remember correctly it adds the values from the per-CPU counters to do the calculation. Therefore they can be negative which will result in a smaller global counter when it hit's the threshold and gets added.

.

Jaime and Jonathan, I think I have to think about the meanings of the counters more, whether they represent used or available memory etc. Thank you both!

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.