Delay in read/write pipe when many requests for unix pipe (sockerpair) communication in Golang

335 Views Asked by At

I have a performance problem with sockerpair in Golang (also known as unix pipe inter-process communication).

The delay would be very long, if there are many requests issued to pipe. One process sends out about 10K requests; and then the reader process of the pipe (sockerpair reader) would be stalled for long time. Right now, I am using syscall.read() and syscall.write().

Any suggestions that I can improve the read latency in the pipe? I have tried using syscall.Recvmsg and syscall.Sendmsg, but they are not help here as I am not familiar how to set the flag correctly.

Following are stace logs, For each line, first element is PID, second one is how much time spent on this instruction, third one is handler to the process, forth one is the operations:

Normal ones:

**13151** 0.000020 [00000000004f23c4] read(8, 
13496 0.000023 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 3} 
13149 0.000020 [000000000046bc07] <... select resumed> ) = 0 (Timeout) <0.000104>
13149 0.000026 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20} 
13496 0.000044 [000000000046bc07] <... select resumed> ) = 0 (Timeout) <0.000077>
13496 0.000022 [000000000046c023] futex(0xc4204ae110, FUTEX_WAKE, 1) = 1 <0.000018>
13495 0.000047 [000000000046c023] <... futex resumed> ) = 0 <0.021090>
13149 0.000012 [000000000046bc07] <... select resumed> ) = 0 (Timeout) <0.000117>
13495 0.000063 [000000000046c023] futex(0xc4204ae110, FUTEX_WAIT, 0, NULL 
13496 0.000033 [00000000004f23c4] write(2, "2016/10/21 17:31:35 [/0-0] flush"..., 55 
13149 0.000023 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20} 
13496 0.000039 [00000000004f23c4] <... write resumed> ) = 55 <0.000044>
13496 0.000030 [00000000004f23c4] write(8, "\20\0\0\0\0\0\0\0a\0\0\0\0\0\0\0", 16) = 16 <0.000023>
13149 0.000042 [000000000046bc07] <... select resumed> ) = 0 (Timeout) <0.000100>
13496 0.000022 [000000000046c023] futex(0xc4204b8110, FUTEX_WAIT, 0, NULL 
**13151** 0.000016 [00000000004f23c4] <... read resumed> "@\0\0\0\22\0\0\0b\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 135168) = 64 <0.000456>

Abnormal (or long delay) ones:

13151 0.000005 [00000000004f23c4] read(8, 
13153 0.000008 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 3} 
13149 0.000010 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20} 
13153 0.000061 [000000000046bc07] <... select resumed> ) = 0 (Timeout) <0.000064>
13153 0.000012 [000000000046c023] futex(0xc42007a910, FUTEX_WAKE, 1) = 1 <0.000010>
13158 0.000020 [000000000046c023] <... futex resumed> ) = 0 <0.031368>
13149 0.000008 [000000000046bc07] <... select resumed> ) = 0 (Timeout) <0.000095>
13158 0.000013 [000000000046c023] futex(0xc42007a910, FUTEX_WAIT, 0, NULL 
13153 0.000006 [00000000004f23c4] write(8, "\20\0\0\0\332\377\377\377\n\0\0\0\0\0\0\0", 16 
13149 0.000008 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20} 
13153 0.000020 [00000000004f23c4] <... write resumed> ) = 16 <0.000022>
13153 0.000028 [000000000046c023] futex(0xc4200e0110, FUTEX_WAIT, 0, NULL 
13149 0.000054 [000000000046bc07] <... select resumed> ) = 0 (Timeout) <0.000085>
13149 0.000013 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) <0.000084>
13149 0.000100 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) <0.000083>
13149 0.000098 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) <0.000083>
13149 0.000099 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) <0.000083>
...... A lot more same record happens here.
13149 0.000721 [000000000046bc07] select(0, NULL, NULL, NULL, {0, 1280}) = 0 (Timeout) <0.001344>
13149 0.000098 [000000000046c023] futex(0x745b78, FUTEX_WAIT, 0, {60, 0} 
13151 30.186205 [00000000004f23c4] <... read resumed> "P\0\2\0\20\0\0\0\v\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 135168) = 131152 <30.201482>
0

There are 0 best solutions below