Even if a process have received data but schedule() in select() cannot return

Solution Verified - Updated -

Issue

  • In the application product, even if data is transmitted from the process of the server while the process of the client is waiting for data by select(), the process does not wake up.    

    Server process            Client process
                               readv()
                               select()
    writev()   --------------> Not return from select()
    
  • server process: pdfes
    client process: pdbes
    The client process of PID16812 was not returned from select().
[Backtrace of PID16812]
 crash> bt 16812
 PID: 16812  TASK: 1020cbd97f0       CPU: 2   COMMAND: "pdbes"
  #0 [1001e38dca8] schedule at ffffffff8030c89e
  #1 [1001e38dd80] schedule_timeout at ffffffff8030d331
  #2 [1001e38dde0] do_select at ffffffff8018cabf
  #3 [1001e38ded0] sys_select at ffffffff8018ce3e
  #4 [1001e38df80] system_call at ffffffff8011026a
     RIP: 0000003df2ec0176  RSP: 0000002b1ec27000  RFLAGS: 00010246
     RAX: 0000000000000017  RBX: ffffffff8011026a  RCX: 0000002b0aec9570
     RDX: 0000000000000000  RSI: 00000000005588b8  RDI: 0000000000000007
     RBP: 0000000000000000   R8: 0000000000000000   R9: 000000000000000b
     R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000000
     R13: 0000007fbffffc10  R14: 0000000000406c70  R15: 0000007fbfffd0c0
     ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

 [Status of WAIT queue]
 crash> net -s 16812
 PID: 16812  TASK: 1020cbd97f0       CPU: 2   COMMAND: "pdbes"
 FD      SOCKET            SOCK       FAMILY:TYPE SOURCE-PORT DESTINATION-PORT
  3      1016c9118c0      10110e6a0c0 INET:STREAM  0.0.0.0-0 0.0.0.0-0
  4      10145904680      100253e4040 UNIX:STREAM
  6      10066d22400      1016f4d8700 INET:STREAM  0.0.0.0-0 0.0.0.0-2768

 crash> struct sock 0x100253e4040 | grep sk_sleep
   sk_sleep = 0x101459046b0,

 crash> waitq 0x101459046b0
 PID: 16812  TASK: 1020cbd97f0       CPU: 2   COMMAND: "pdbes"

 [Result of netstat]
 # netstat -anp |grep 16812
 -------------------------------------------------------------------
 tcp         0      0 0.0.0.0:57192               0.0.0.0:*                  
LISTEN      16812/pdbes
 tcp     13572      0 10.208.131.224:54096        10.208.131.227:57147       
ESTABLISHED 16812/pdbes
 unix  2      [ ACC ]     STREAM     LISTENING     2464034988 16812/pdbes      
  /dev/HiRDB/pth/tk26847
 -------------------------------------------------------------------
 * There are data of 13572bytes in the reception queue of PID16812.

[Collection of the system info by systemtap]
 Based on the above-mentioned result of the survey,
 when we tried the information collection by systemtap,
 we found the server process did not call try_to_wake_up().
 WAIT queue and the result of netstat command have the same situation to
 the survey of PID16812 as above.

 * the client process of PID17519 was not returned from select().
 ----------------------------------------------------------------------
 …
 pdbes : do_select(pid:17519)
 pdbes : add_wait_queue(pid:17519)
 pdbes : add_wait_queue(pid:17519)
 pdbes : add_wait_queue(pid:17519)
 pdfes : sock_def_readable(sock:0x101CEEAF840)  //pdbes : PID 17519
 pdfes : try_to_wake_up(17519)
 pdbes : do_select(pid:17519)
 pdbes : add_wait_queue(pid:17519)
 pdbes : add_wait_queue(pid:17519)
 pdbes : add_wait_queue(pid:17519)
 pdfes : sock_def_readable(sock:0x101CEEAF840)  //pdbes : PID 17519
 ----------------------------------------------------------------------
 => The display of the client process(PID17519) is as above.
    It seems try_to_wake_up() was not called.

Environment

  • Red Hat Enterprise Linux 5.4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content