Memory access is too slow

I get similar results, running my benchmark at https://hoult.org/test_memcpy.c

ubuntu@ubuntu:~/programs$ ./test_memcpy 
Byte size :              ns     Speed
        0 :            18.3       0.0 MB/s
        1 :            23.3      40.9 MB/s
        2 :            23.8      80.3 MB/s
        4 :            34.4     110.8 MB/s
        8 :            45.6     167.4 MB/s
       16 :            38.1     400.3 MB/s
       32 :            39.1     779.7 MB/s
       64 :            45.2    1351.8 MB/s
      128 :            56.8    2150.2 MB/s
      256 :            84.8    2880.1 MB/s
      512 :           135.1    3614.7 MB/s
     1024 :           243.8    4006.0 MB/s
     2048 :           447.8    4361.3 MB/s
     4096 :           861.9    4532.0 MB/s
     8192 :          1682.8    4642.7 MB/s
    16384 :          3481.7    4487.7 MB/s
    32768 :         20896.7    1495.5 MB/s
    65536 :         47393.2    1318.8 MB/s
   131072 :         96372.7    1297.0 MB/s
   262144 :        193140.3    1294.4 MB/s
   524288 :        400208.0    1249.4 MB/s
  1048576 :       2133293.0     468.8 MB/s
  2097152 :       9486804.7     210.8 MB/s
  4194304 :      22763531.2     175.7 MB/s
  8388608 :      45851468.8     174.5 MB/s
 16777216 :      92099687.5     173.7 MB/s
 33554432 :     183821750.0     174.1 MB/s
 67108864 :     367601500.0     174.1 MB/s

That’s with the CPU running at 1.5 GHz. Note it’s 174 MB/s read plus 174 MB/s write for a total bandwidth of around 350 MB/s.

At least it’s much faster than the HiFive Unleashed.

The BeagleV beta board (with SiFive U74 cores but possibly different DDR controller) gives similar results. I have made BeagleBoard and StarFive aware of my concerns about the very slow DRAM speed and they have assured me that the SoC I have now is only a test item and all will be fixed in the mass produced version. I’m dubious, to be honest.

In contrast, the $99 Allwinner D1 “Nezha” evaluation board gives much higher figures with an Alibaba C906 single-issue core running at 1.0 GHz (extract from https://hoult.org/d1_memcpy.txt):

rvbtest@RVboards:~$ ./test_memcpy_std 
Byte size :              ns     Speed
        0 :            50.3       0.0 MB/s
        1 :            54.8      17.4 MB/s
        2 :            61.6      31.0 MB/s
        4 :            71.6      53.3 MB/s
        8 :            91.6      83.3 MB/s
       16 :            93.7     162.9 MB/s
       32 :            99.7     306.2 MB/s
       64 :           111.6     546.8 MB/s
      128 :           140.5     868.5 MB/s
      256 :           198.4    1230.6 MB/s
      512 :           314.0    1554.9 MB/s
     1024 :           551.7    1770.0 MB/s
     2048 :          1011.4    1931.1 MB/s
     4096 :          1937.8    2015.8 MB/s
     8192 :          3795.8    2058.2 MB/s
    16384 :          8336.3    1874.3 MB/s
    32768 :         20937.3    1492.5 MB/s
    65536 :         58882.3    1061.4 MB/s
   131072 :        113748.5    1098.9 MB/s
   262144 :        225554.1    1108.4 MB/s
   524288 :        446150.4    1120.7 MB/s
  1048576 :        927754.9    1077.9 MB/s
  2097152 :       1849499.0    1081.4 MB/s
  4194304 :       3666302.7    1091.0 MB/s
  8388608 :       7309773.4    1094.4 MB/s
 16777216 :      14528070.3    1101.3 MB/s
 33554432 :      28922562.5    1106.4 MB/s
 67108864 :      57848562.5    1106.3 MB/s

A speed difference of 6.3x in favour of the Allwinner is not insignificant.

There’s nothing wrong with the U74’s core or L1 cache (2.3x faster than the D1), though the Unmatched’s L2 cache is barely faster than the D1’s RAM at about 1250 vs 1100 MB/s.

1 Like