I’ve managed to get perf working for Unmatched PMU in linux.
The most recent OpenSBI needs to be patched, also Linux patches from Atish Patra are needed with a single patch (if you need firmware insret/cycle spent in M-Mode only).
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
The reason for OpenSBI patches is that it is currently relying on mscountinhibit, that is absent on U740 Unmatched, i don’t see any real need for such a strict check.
Not sure if it’s due to something i messed up while rebasing but had to make the following change to opensbi to get sbi to compile otherwise gcc’d throw error: label at end of compound statement
Thanks! I managed to compile opensbi (PLATFORM=generic, and build it into u-boot), and the kernel with the PMU driver included and CONFIG_RISCV_PMU_SBI enabled.
The device is visible in /sys/bus/platform/drivers/riscv-pmu and
$ dmesg|grep PMU
[ 3.933499] SBI PMU extension is available
I see the extra hardware events (such as branch-instructions) in perf list.
However I don’t seem to be getting any events. I tried with perf top -e branch-instructions , perf top -e cycles. The numbers stay at 0, even though some things are happening on the system. It only seems to work with the software events like cpu-clock. Not sure if it’s related, some of these in dmesg:
[ 3554.578191] Starting counter idx 0 failed with error -524
[ 5278.344707] Starting counter idx 2 failed with error -524
(-524 is ENOTSUPP, apparently)
Edit: oh, the command line from the LKML post does work:
# perf stat -e r8000000000000005 -e r8000000000000007 -e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses -e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -e cycles -e instructions hackbench --pipe 15 process
Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
Each sender will pass 100 messages of 100 bytes
Time: 0.408
Performance counter stats for 'hackbench --pipe 15 process':
214 r8000000000000005 (53.71%)
2,300 r8000000000000007 (62.60%)
3,119 r8000000000000006 (68.50%)
<not counted> r0000000000020002 (0.00%)
<not counted> r0000000000020004 (0.00%)
<not counted> branch-misses (0.00%)
<not counted> cache-misses (0.00%)
<not counted> dTLB-load-misses (0.00%)
<not counted> dTLB-store-misses (0.00%)
<not counted> iTLB-load-misses (0.00%)
934,956,767 cycles (21.07%)
539,665,451 instructions # 0.58 insn per cycle (40.55%)
0.592143959 seconds time elapsed
0.425950000 seconds user
1.462119000 seconds sys
Tried it a a few times, every time it stays at “Collecting samples…”.
Looks like “perf record” gets no samples, either. Maybe it’s related?
# perf record -e cycles -e instructions -c 1000 hackbench
Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
Each sender will pass 100 messages of 100 bytes
Time: 0.917
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.047 MB perf.data ]
# perf report --stdio
Error:
The perf.data data has no samples!
No problem. I’m really happy to see this. I tried to get perf to work with the performance counters on the Unleashed board once but it went nowhere, too many levels of abstraction in between.
Speaking of which: don’t there need to be pmu and pmu,event-to entries in the DTS file for the board, for the vendor-specific counters? (and accompanying u-boot patch) Or are these general counters always available?
Perf record will not work on hifive unmatched as it doesn’t implement sscofpmf implementation. sscofpmf extension provisions for local counter overflow interrupts.
However, the linux pmu driver should print that event counting is not supported in absense of sscofpmf implementation. I will look into that.
Perf record will not work on hifive unmatched as it doesn’t implement sscofpmf implementation. sscofpmf extension provisions for local counter overflow interrupts.
Yes - sscofpmf make sense - didn’t thought about it.
# perf stat sleep 1
Performance counter stats for 'sleep 1':
1.23 msec task-clock # 0.001 CPUs utilized
1 context-switches # 815.661 /sec
0 cpu-migrations # 0.000 /sec
45 page-faults # 36.705 K/sec
1468356 cycles # 1.198 GHz
508982 instructions # 0.35 insn per cycle
69255 branches # 56.489 M/sec
25223 branch-misses # 36.42% of all branches
1.002246000 seconds time elapsed
0.002639000 seconds user
0.000000000 seconds sys
Now displayed correctly see table.
You still can’t and it won’t be possible to use sampling (perf record, perf top) with hardware counters as leader, but error is now displayed correctly:
# perf record
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Still you can record with task-clock, cpu-clock (or any software counter) as leaders - if it useful for you, i.e.:
# perf record -e '{cpu-clock,cycles,instructions,branches}:s' sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (36 samples) ]
This is great, but curious how we can validate that the counter is accurate? With more events than counters, is the kernel automatically multiplexing them? These values could be more of an estimate and not an actual count, so wondering how we can gain confidence.