Even if a process have received data but schedule() in select() cannot return
Issue
-
In the application product, even if data is transmitted from the process of the server while the process of the client is waiting for data by select(), the process does not wake up.
Server process Client process readv() select() writev() --------------> Not return from select()
-
server process: pdfes
client process: pdbes
The client process of PID16812 was not returned from select().
[Backtrace of PID16812]
crash> bt 16812
PID: 16812 TASK: 1020cbd97f0 CPU: 2 COMMAND: "pdbes"
#0 [1001e38dca8] schedule at ffffffff8030c89e
#1 [1001e38dd80] schedule_timeout at ffffffff8030d331
#2 [1001e38dde0] do_select at ffffffff8018cabf
#3 [1001e38ded0] sys_select at ffffffff8018ce3e
#4 [1001e38df80] system_call at ffffffff8011026a
RIP: 0000003df2ec0176 RSP: 0000002b1ec27000 RFLAGS: 00010246
RAX: 0000000000000017 RBX: ffffffff8011026a RCX: 0000002b0aec9570
RDX: 0000000000000000 RSI: 00000000005588b8 RDI: 0000000000000007
RBP: 0000000000000000 R8: 0000000000000000 R9: 000000000000000b
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000007fbffffc10 R14: 0000000000406c70 R15: 0000007fbfffd0c0
ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b
[Status of WAIT queue]
crash> net -s 16812
PID: 16812 TASK: 1020cbd97f0 CPU: 2 COMMAND: "pdbes"
FD SOCKET SOCK FAMILY:TYPE SOURCE-PORT DESTINATION-PORT
3 1016c9118c0 10110e6a0c0 INET:STREAM 0.0.0.0-0 0.0.0.0-0
4 10145904680 100253e4040 UNIX:STREAM
6 10066d22400 1016f4d8700 INET:STREAM 0.0.0.0-0 0.0.0.0-2768
crash> struct sock 0x100253e4040 | grep sk_sleep
sk_sleep = 0x101459046b0,
crash> waitq 0x101459046b0
PID: 16812 TASK: 1020cbd97f0 CPU: 2 COMMAND: "pdbes"
[Result of netstat]
# netstat -anp |grep 16812
-------------------------------------------------------------------
tcp 0 0 0.0.0.0:57192 0.0.0.0:*
LISTEN 16812/pdbes
tcp 13572 0 10.208.131.224:54096 10.208.131.227:57147
ESTABLISHED 16812/pdbes
unix 2 [ ACC ] STREAM LISTENING 2464034988 16812/pdbes
/dev/HiRDB/pth/tk26847
-------------------------------------------------------------------
* There are data of 13572bytes in the reception queue of PID16812.
[Collection of the system info by systemtap]
Based on the above-mentioned result of the survey,
when we tried the information collection by systemtap,
we found the server process did not call try_to_wake_up().
WAIT queue and the result of netstat command have the same situation to
the survey of PID16812 as above.
* the client process of PID17519 was not returned from select().
----------------------------------------------------------------------
…
pdbes : do_select(pid:17519)
pdbes : add_wait_queue(pid:17519)
pdbes : add_wait_queue(pid:17519)
pdbes : add_wait_queue(pid:17519)
pdfes : sock_def_readable(sock:0x101CEEAF840) //pdbes : PID 17519
pdfes : try_to_wake_up(17519)
pdbes : do_select(pid:17519)
pdbes : add_wait_queue(pid:17519)
pdbes : add_wait_queue(pid:17519)
pdbes : add_wait_queue(pid:17519)
pdfes : sock_def_readable(sock:0x101CEEAF840) //pdbes : PID 17519
----------------------------------------------------------------------
=> The display of the client process(PID17519) is as above.
It seems try_to_wake_up() was not called.
Environment
- Red Hat Enterprise Linux 5.4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.