Why writing speed of RAM is coming far less than what is mentioned on it?

3.2k Views Asked by At

I uses below command to measure the ram writing speed but it is showing me far less than what is mentioned on RAM.

time dd if=/dev/zero of=tes bs=100M count=10 oflag=dsync && sync
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 1.05167 s, 997 MB/s

real    0m1.056s
user    0m0.001s
sys     0m1.053s

I am using DDR3 and calculating theoretical max ram speed by below formula is:

Max transfer rate= clock x no of bits / 8
DIMM module transfer 64 bits 
Max Theoretical Transfer Rate= clock x (64/8)
=1333 x 8  
=10,664 MB/s

So theoretical expected speed should be 10 GB/s (approx) but in reality it is comming out far less. So Can anyone please tell me why? Thanks in advance!

4

There are 4 best solutions below

3
On

Well, if you write a huge number of small files, you'll encounter a lot of latency, which slows down the actual writing speed. If you write larger files to your RAM memory, you should be able to achieve higher speed levels.

0
On

I am not sure what you are measuring. I understand that the command dd is to copy a file, in this case referencing an empty file but generating 100 MB and copying it 10 times to file tes. In my case, I obtain the following results, shown along with vmstat results. It writes a 1 GB file in the home folder at the reasonable speed of 84.8 GB/sec.

time dd if=/dev/zero of=tes bs=100M count=10 oflag=dsync && sync
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 12.3723 s, 84.8 MB/s

real    0m12.487s
user    0m0.001s
sys     0m0.862s

 VMSTAT

 procs   -----------memory----------  ---swap-- -----io----   -system-- ------cpu-----
 r  b   swpd     free   buff   cache    si   so    bi     bo    in   cs us sy id wa st
 0  0      0 30562108  38424 1418240     0    0     0      0   396 1285  2  1 98  0  0
 0  0      0 30570832  38424 1418240     0    0     0      0   294 2000  2  1 97  0  0
 1  0      0 30644600  38424 1346132     0    0     0      0   186  743  1  0 98  0  0
 0  1      0 31415984  38424  496640     0    0     0  49512   994 2778  1  3 86 10  0
 0  1      0 31416020  38428  496636     0    0     0  52888  1813 5080  1  1 85 13  0
 0  1      0 31311368  38440  599048     0    0     0 102472  2460 6781  1  2 86 12  0
 0  3      0 31206488  38452  701456     0    0     0  87580  2443 6891  1  2 85 12  0
 0  1      0 31100524  38464  803508     0    0     0  90976  2411 6840  1  2 86 12  0
 2  1      0 30995476  38472  906068     0    0     0  87256  2400 6791  1  2 86 12  0
 0  1      0 30890280  38484 1008336     0    0     0  87136  2427 6845  1  2 86 12  0
 0  1      0 30785324  38500 1111016     0    0     4  81592  2406 6696  1  2 85 12  0
 0  1      0 30785356  38508 1111080     0    0     0  77612  2579 7258  1  1 86 12  0
 0  1      0 30680108  38512 1213496     0    0     0 102400  2685 7511  1  2 85 12  0
 0  1      0 30575224  38524 1315560     0    0     0 102428  2446 6667  1  2 85 12  0
 0  1      0 30470072  38532 1417968     0    0     0  87484  2392 6725  1  2 86 12  0
 0  0      0 30572884  38544 1418312     0    0     0  15064   994 3239  1  1 92  6  0
 0  0      0 30572744  38544 1418312     0    0     0      0   128  414  1  0 99  0  0
 0  0      0 30573116  38544 1418312     0    0     0      0   160  466  1  0 99  0  0
 0  0      0 30573168  38544 1418312     0    0     0      0   112  361  1  0 99  0  0

           Used up to 0.77 GB                  total 1024400 KB
3
On

IMO, there are several wrong assumptions in the question, but it is interesting anyway.

The calculation of theoretical RAM speed proposed in the question seems to forget multi-channel architectures. I would use the following formula:

Max transfer rate = clock frequency * transfers per clock * interface width * number of interfaces
                    to be divided by 8 to get the results in bytes/s

In your example, clock frequency = 667 MHz, transfers per clock = 2 (because it is DDR-1333 memory), interface width = 64 bits, and the number of interfaces depends on your motherboard and the number of plugged memory modules. Most recent PCs provide 2 channels. Recent servers provide 3 or 4 channels. The number of interfaces is min(number of modules per CPU, number of channels).

Some information about the burst rate of the DD3 memory: http://en.wikipedia.org/wiki/DDR3_SDRAM

Now, you have to keep in mind that this bandwidth corresponds to a theoretical burst rate, generally only sustainable for brief periods of time. Furthermore, it only qualifies the memory module capabilities, it means nothing for the front side bus and the CPU memory controllers. In other words, even with very fast memory modules, a slow CPU may not be able to saturate the memory bandwidth. Bottlenecks are not always in the memory modules.

On ccNUMA machines (most servers with 2 or 4 sockets), if a CPU core needs to access a block located on a memory bank attached to another CPUs, the interconnection bus (QPI or hypertransport) will be used. This bus can also be a bottleneck.

Finally, I think the methodology of the test (using dd) is flawed, because:

  • It does not exercise only memory transfers, because dd uses the filesystem interface. Even assuming that the resulting file is hosted in a memory filesystem (such as tmpfs or /dev/shm), dd will make system calls to perform the operation, which brings additional costs.

  • dd is a single-threaded process. One single core may not be enough to saturate the whole memory bandwidth. On a server with multiple sockets, this is 100% guaranteed. On a single socket system, I guess it depends on the CPU itself.

If you really want to evaluate the actual memory bandwidth and compare it to the theoretical limit, I would suggest to use a benchmark program designed for this purpose. For instance the STREAM benchmark is often used to measure the sustainable memory bandwidth.

0
On

dd measures not RAM speed but filesystem speed. Even if you were to dd to /dev/shm (on Linux systems /dev/shm is a ramdisk), you're still measuring mostly the filesystem overhead and very little of the memory write throughput.

There are memory test tools for checking RAM speeds, both Linux command-line or boot-into. I use the boot-into memtest86 when checking my systems.

Your "max bandwidth" calculation does not account for address and cycle times; actual max throughput will be less. On my DDR3 AMD system I measure a little above 4GB/sec actual read throughput (Intel is higher I believe).