Network performance degradation after update to Red Hat Enterprise Linux 6.4 from 6.2
Environment
- Red Hat Enterprise Linux 6.3, 6.4
- igb network driver version 3.2.10-k
Issue
This is regarding a userspace program which processes messages.
When using Red Hat Enterprise Linux (RHEL) 6.2, we were able to achieve 459k messages per second over a 10 minute duration. We noted that roughly 3.17% of the CPU
was idle. This is on a HP G8 24 cpu machine with 32 gig of memory. We are using 11 NICs.
When this setup was upgraded to RHEL 6.4, the best we can do now is 275k message per second with 4.38% of the CPU idle.
Traffic over the NICs uses TCP.
+++++++
We have done an apples-to-apples comparison
by have two separate disks - one with RH 6.2 and the other with RH 6.4- everything else on the disks are identical to the best of our knowledge.
We put in the 6.2 disk and get 459k.
We put in the 6.4 disk and get 275k.
Resolution
Try disabling TCP window scaling by adding the following line to /etc/sysctl.conf and then activate it with 'sysctl -p'
net.ipv4.tcp_window_scaling = 0
As described in https://access.redhat.com/site/solutions/358513, application changes may also be needed.
Root Cause
The performance degradation was introduced in the igb driver update that went into the 2.6.32-233 kernel via BZ 737714.
Before the update the igb driver version is:
kernel: Intel(R) Gigabit Ethernet Network Driver - version 3.0.6-k
After the update the igb driver version is:
kernel: igb: Intel(R) Gigabit Ethernet Network Driver - version 3.2.10-k
This issue may be caused by the problem described in https://access.redhat.com/site/solutions/358513.
Diagnostic Steps
We can provide that, if needed, and could provide detail on any instrumentation for your reference, thank you.
For the first round of testing, we'll be starting simpler, and getting baselines. For each test run, we'll want to get the following 4 items:
1) profiler data over the course of the test run:
a) Download the following script:
- http://people.redhat.com/fhirtz/rhprofiler/rh-profiler13.sh
b) In a terminal session, just run it interactively prior to the test beginning
- ./rh-profiler13.sh 10 (that'll have it sample at 10s increments)
c) After the test run is completed, just hit ctrl-c in that terminal and it'll create a tarball with the data (name noted at runtime).
2) perf -a -g
a) mkdir /tmp/perf-
b) cd /tmp/perf
c) perf record -a -g
d) perf archive (right after the test)
3) Install the 'tuna' package from the base RHEL6 channel and gather the following:
a) tuna -Q
b) tuna -P (while the test is running)
4) sosreport from the host.
Thanks again, and please let me know if there are questions or concerns.
I have attached the perf report histogram display for the two runs.
$ head perf.hist.6_2.collapsed
+ 58.84% adc [kernel.kallsyms]
+ 19.91% adc adc
+ 4.87% adc libpthread-2.12.so
+ 4.44% adc libstdc++.so.6.0.13
+ 3.64% adc libdiamlite.so
+ 3.53% adc libc-2.12.so
+ 1.39% adc [igb]
+ 1.19% init [kernel.kallsyms]
+ 0.89% adc libtcmalloc_minimal.so.4
+ 0.40% adc [vdso]
$ head perf.hist.6_4.collapsed
+ 68.83% adc [kernel.kallsyms]
+ 13.83% adc adc
+ 3.35% adc libpthread-2.12.so
+ 3.18% adc libstdc++.so.6.0.13
+ 2.58% adc libc-2.12.so
+ 2.50% adc libdiamlite.so
+ 2.16% init [kernel.kallsyms]
+ 1.78% adc [igb]
+ 0.53% adc libtcmalloc_minimal.so.4
+ 0.32% adc [vdso]
We can easily see the application spending more time in system code on the 6.4 kernel.
More analysis will follow.
Time in the kernel on 6.2:
- 33.30% adc [kernel.kallsyms]
- _spin_lock
- 64.11% futex_wake
- 32.60% futex_wait_setup
- 0.85% sch_direct_xmit
- 0.83% task_rq_lock
- _spin_lock
- 13.83% adc adc
- 8.30% adc [kernel.kallsyms]
- _spin_lock_bh
- 87.78% lock_sock_nested
- 9.81% ioat2_issue_pending
- 1.89% release_sock
- _spin_lock_bh
- 2.50% adc libdiamlite.so
- 2.30% adc [kernel.kallsyms]
- memcpy
- 85.22% tcp_collapse
- 13.32% tcp_try_rmem_schedule
- 0.73% ip_output
- memcpy
- 1.19% adc [kernel.kallsyms]
- copy_user_generic_string
- 68.36% skb_copy_datagram_iovec
- 16.26% sock_aio_write
- 10.47% get_futex_value_locked
- 4.59% do_readv_writev
- copy_user_generic_string
- 1.12% adc [igb]
- igb_poll
- 99.32% net_rx_action
- 0.68% __do_softirq
- igb_poll
- 0.95% adc [kernel.kallsyms]
- tcp_sendmsg
- 99.52% sock_aio_write
- tcp_sendmsg
- 0.91% adc [kernel.kallsyms]
- kfree
- 89.93% skb_release_data
- 6.57% __audit_syscall_exit
- 2.98% sysret_audit
- 0.51% __kfree_skb
- kfree
- 0.78% adc [kernel.kallsyms]
- tcp_recvmsg
- 93.77% sock_common_recvmsg
- 6.23% sock_recvmsg
- tcp_recvmsg
- 0.77% adc libpthread-2.12.so
- 0.70% adc [kernel.kallsyms]
- tcp_collapse
- 100.00% tcp_try_rmem_schedule
- tcp_collapse
- 0.68% adc libpthread-2.12.so
- 0.62% adc [kernel.kallsyms]
- fget_light
- 58.24% sys_writev
- 38.33% sockfd_lookup_light
- 1.75% system_call_fastpath
- 1.60% sys_recvfrom
- fget_light
- 0.58% adc libstdc++.so.6.0.13
- 0.57% adc libpthread-2.12.so
- 0.53% adc libtcmalloc_minimal.so.4
- 0.53% adc [kernel.kallsyms]
- kmem_cache_free
- 97.66% __kfree_skb
- 0.99% slab_destroy
- 0.52% tcp_recvmsg
- kmem_cache_free
Time in the kernel in 6.4:
- 29.05% adc [kernel.kallsyms]
- _spin_lock
- 60.52% futex_wake
- 33.38% futex_wait_setup
- 1.71% task_rq_lock
- 0.98% ____cache_alloc_node
- 0.90% sch_direct_xmit
- _spin_lock
- 19.91% adc adc
- 3.64% adc libdiamlite.so
- 1.48% adc [kernel.kallsyms]
- tcp_sendmsg
- 97.46% sock_aio_write
- 2.54% do_sync_readv_writev
- tcp_sendmsg
- 1.46% adc [kernel.kallsyms]
- copy_user_generic_string
- 64.55% skb_copy_datagram_iovec
- 16.02% sock_aio_write
- 14.64% get_futex_value_locked
- 4.39% do_readv_writev
- copy_user_generic_string
- 1.43% adc [kernel.kallsyms]
- _spin_lock_bh
- 64.26% lock_sock_nested
- 16.96% ioat2_issue_pending
- 15.64% release_sock
- 1.51% tcp_recvmsg
- 0.96% tcp_sendmsg
- 0.67% dma_issue_pending_all
- _spin_lock_bh
- 1.12% adc libpthread-2.12.so
- 1.00% adc [kernel.kallsyms]
- fget_light
- 56.22% sys_writev
- 40.09% sockfd_lookup_light
- 1.90% system_call_fastpath
- 1.56% sys_recvfrom
- fget_light
- 0.99% adc libpthread-2.12.so
- 0.99% adc [kernel.kallsyms]
- tcp_recvmsg
- 94.69% sock_common_recvmsg
- 5.30% sock_recvmsg
- tcp_recvmsg
- 0.89% adc libtcmalloc_minimal.so.4
- 0.85% adc libstdc++.so.6.0.13
- 0.81% adc libpthread-2.12.so
- 0.70% adc [kernel.kallsyms]
- tcp_write_xmit
- 94.99% __tcp_push_pending_frames
- 4.51% tcp_push
- 0.50% tcp_data_snd_check
- tcp_write_xmit
- 0.66% adc [igb]
- 0.64% adc libstdc++.so.6.0.13
- 0.63% adc [kernel.kallsyms]
- __audit_syscall_exit
- 95.88% sysret_audit
1.61% __libc_recv - 1.20% __libc_writev
0.61% __lll_unlock_wake
- 95.88% sysret_audit
- __audit_syscall_exit
- 0.60% adc libstdc++.so.6.0.13
- 0.54% adc [kernel.kallsyms]
- kfree
- 70.51% skb_release_data
- 19.02% __audit_syscall_exit
- 9.76% sysret_audit
- 0.70% __kfree_skb
- kfree
- 0.52% adc libstdc++.so.6.0.13
- 0.52% adc libc-2.12.so
- 0.50% adc [kernel.kallsyms]
- list_del
- 60.58% free_block
- 17.90% ____cache_alloc_node
- 5.12% free_pcppages_bulk
- 4.07% __rmqueue
- 3.30% get_page_from_freelist
- 2.88% __drain_alien_cache
- 2.24% napi_complete
- 1.14% kmem_cache_free
- 1.03% __napi_complete
- 0.50% free_hot_cold_page
- list_del
- 0.49% adc libc-2.12.so
- 0.47% adc libc-2.12.so
- 0.46% adc [igb]
- 0.41% adc libstdc++.so.6.0.13
- 0.41% adc [kernel.kallsyms]
- sock_aio_write
- 89.57% do_sync_readv_writev
- 10.42% do_readv_writev
- sock_aio_write
- 0.40% adc libstdc++.so.6.0.13
- 0.40% adc libstdc++.so.6.0.13
- 0.40% adc libpthread-2.12.so
- 0.40% adc [vdso]
- 0.38% adc [kernel.kallsyms]
- tcp_ack
- 98.24% tcp_rcv_established
- 1.76% tcp_v4_do_rcv
- tcp_ack
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
