HI all,
I have some trouble with 1 core STREAM bandwidth on P550 Primier (EIC7700) and I don’t know that I’m doing wrong.
Maybe somebody can help me ?
One more disclamer that I’m newbie in RISC-V and P550 world.
OS: default, Ubuntu 24.04.2 LTS
Kernel: default, 6.6.77-1-premier #4 SMP PREEMPT_DYNAMIC Thu Apr 10 00:15:20 UTC 2025
Compiler:
clang -v
Ubuntu clang version 18.1.3 (1ubuntu1)
Target: riscv64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/riscv64-linux-gnu/13
Found candidate GCC installation: /usr/bin/../lib/gcc/riscv64-linux-gnu/14
Selected GCC installation: /usr/bin/../lib/gcc/riscv64-linux-gnu/14
I’ve got STREAM from GitHub - jeffhammond/STREAM: STREAM benchmark
Main parts of Makefile
CC = clang
CFLAGS = -march=rv64gc_zba_zbb -mabi=lp64d -mtune=sifive-u74 -mcmodel=medany -msmall-data-limit=8 -ffunction-sections -fdata-sections -fno-common -ftls-model=local-exec -O3 -falign-functions=4 -mllvm -unroll-count=8 -Wno-unknown-pragmas -Wno-unused-but-set-variable -fopenmp
stream_c.exe: stream.c
$(CC) $(CFLAGS) stream.c -o stream_c.exe
Freq
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
1400000
1400000
1400000
1400000
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
1400000
1400000
1400000
1400000
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
Run output
make; OMP_NUM_THREADS=1 taskset -c 1 ./stream_c.exe
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 92772 microseconds.
(= 92772 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 2071.9 0.088507 0.077225 0.093541
Scale: 2092.1 0.088172 0.076479 0.092426
Add: 1193.1 0.203638 0.201155 0.205095
Triad: 1179.1 0.204903 0.203539 0.206480
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Why so low bandwidth ?
What am I doing wrong ?
Is it expected mem bandwidth for P550 (EIC7700) ?