My application is using O_DIRECT for flushing 2MB worth of data directly to a 3-way-stripe storage (mounted as an lvm volume)..
I am getting a very pathetic write speed on this storage. The iostat shows that the large request size is being broken into smaller ones.
avgrq-sz
is <20... There aren't much read on that drive.
It takes around 2 seconds to flush down 2MB worth of contiguous memory blocks (using mlock
to assure that), sector aligned (using posix_memalign
), whereas tests with dd
and iozone
rate the storage capable of > 20Mbps of write speed.
I would appreciate any clues on how to investigate this issue further.
PS: If this is not the right forum for this query, I would appreciate indicators to a one that could be helpful.
Thanks.
The disk itself may have a maximum request size, there is a tradeoff being block size and latency (the bigger the request being sent to the disk the longer it will likely take to to be consumed) and there can be constraints on how much vectored I/O a driver can consume in a single request. Given all the above, the kernel is going to "break up" single requests that are too large when submitting further down the stack.
Unfortunately it's hard to say why the
avgrq-sz
is so small (if its in sectors that about 10KBytes per I/O) without seeing the code actually submitting the I/O (maybe your program is submitting 10KByte buffers?). We also don't know ifiozone
anddd
were usingO_DIRECT
during the questioners test. If they weren't then their I/O would have been going into the write back cache and then streamed out later and the kernel can do that in a more optimal fashion.Note: Using
O_DIRECT
is NOT a go faster stripe. In the right circumstancesO_DIRECT
can lower overhead BUT writingO_DIRECT
ly to do disk increases the pressure on you to submit I/O in parallel (e.g. via AIO/io_uring
or via multiple processes/threads) if you want to reach the highest possible throughput because you have robbed the kernel of its best way of creating parallel submission to the device for you.