Can't grab instructions by perf record

Hi all,
Unfortunately i have some troubles with getting instructions event by perf record.
Maybe someone know how to fix it ?

So, i have default Ubuntu 24.04.3 LTS with default kernel 6.6.77.2-premier and according it perf.

apt list | grep linux.*tools| grep  6.6.77-2

linux-premier-tools-6.6.77-2/noble,now 6.6.77-2.2 riscv64 [installed,automatic]

linux-tools-6.6.77-2-premier/noble,now 6.6.77-2.2 riscv64 [installed]
linux-tools-premier/noble,now 6.6.77-2.2 riscv64 [installed]

When i try to get instructions by perf stat all it seems work.

perf stat -e instructions /usr/bin/hostname
nnpdev550p01
Performance counter stats for '/usr/bin/hostname':
931727 instructions

But if i try to grab it by perf record it surprisingly showed only 0 instruction.

perf record -vvv -g -c 1000 -e cpu-clock,instructions  --call-graph dwarf /usr/bin/hostname
callchain: type FP
Using CPUID 0x489-0x8000000000000008-0x6220425
callchain: type DWARF
callchain: stack dump size 8192
DEBUGINFOD_URLS=
nr_cblocks: 0
affinity: SYS
mmap flush: 1
comp level: 0
perf record opening and mmapping events
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
  type                             1 (software)
  size                             136
  config                           0 (PERF_COUNT_SW_CPU_CLOCK)
  { sample_period, sample_freq }   1000
  sample_type                      IP|TID|TIME|ADDR|CALLCHAIN|ID|REGS_USER|STACK_USER|DATA_SRC
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  mmap_data                        1
  sample_id_all                    1
  exclude_guest                    1
  exclude_callchain_user           1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  sample_regs_user                 0xffffffff
  sample_stack_user                8192
------------------------------------------------------------
sys_perf_event_open: pid 2235690  cpu 0  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid 2235690  cpu 1  group_fd -1  flags 0x8 = 6
sys_perf_event_open: pid 2235690  cpu 2  group_fd -1  flags 0x8 = 7
sys_perf_event_open: pid 2235690  cpu 3  group_fd -1  flags 0x8 = 9
Opening: instructions
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0x1 (PERF_COUNT_HW_INSTRUCTIONS)
  { sample_period, sample_freq }   1000
  sample_type                      IP|TID|TIME|ADDR|CALLCHAIN|ID|REGS_USER|STACK_USER|DATA_SRC
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  sample_id_all                    1
  exclude_guest                    1
  exclude_callchain_user           1
  sample_regs_user                 0xffffffff
  sample_stack_user                8192
------------------------------------------------------------
sys_perf_event_open: pid 2235690  cpu 0  group_fd -1  flags 0x8 = 10
sys_perf_event_open: pid 2235690  cpu 1  group_fd -1  flags 0x8 = 11
sys_perf_event_open: pid 2235690  cpu 2  group_fd -1  flags 0x8 = 12
sys_perf_event_open: pid 2235690  cpu 3  group_fd -1  flags 0x8 = 13
mmap size 528384B
libperf: mmap_per_cpu: nr cpu values 4 nr threads 1
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 10 -> 5
libperf: idx 1: mmapping fd 6
libperf: idx 1: set output fd 11 -> 6
libperf: idx 2: mmapping fd 7
libperf: idx 2: set output fd 12 -> 7
libperf: idx 3: mmapping fd 9
libperf: idx 3: set output fd 13 -> 9
Control descriptor is not initialized
thread_data[0x5557bdf98870]: nr_mmaps=4, maps=0x5557bdf98900, ow_maps=(nil)
thread_data[0x5557bdf98870]: cpu0: maps[0] -> mmap[0]
thread_data[0x5557bdf98870]: cpu1: maps[1] -> mmap[1]
thread_data[0x5557bdf98870]: cpu2: maps[2] -> mmap[2]
thread_data[0x5557bdf98870]: cpu3: maps[3] -> mmap[3]
thread_data[0x5557bdf98870]: pollfd[0] <- event_fd=5
thread_data[0x5557bdf98870]: pollfd[1] <- event_fd=10
thread_data[0x5557bdf98870]: pollfd[2] <- event_fd=6
thread_data[0x5557bdf98870]: pollfd[3] <- event_fd=11
thread_data[0x5557bdf98870]: pollfd[4] <- event_fd=7
thread_data[0x5557bdf98870]: pollfd[5] <- event_fd=12
thread_data[0x5557bdf98870]: pollfd[6] <- event_fd=9
thread_data[0x5557bdf98870]: pollfd[7] <- event_fd=13
thread_data[0x5557bdf98870]: pollfd[8] <- non_perf_event fd=4
perf record done opening and mmapping events
Opening: dummy:

------------------------------------------------------------
perf_event_attr:
  type                             1 (software)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  watermark                        1
  sample_id_all                    1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
mmap size 528384B
libperf: mmap_per_cpu: nr cpu values 4 nr threads 1
libperf: idx 0: mmapping fd 14
libperf: idx 1: mmapping fd 15
libperf: idx 2: mmapping fd 16
libperf: idx 3: mmapping fd 17
Synthesizing TSC conversion information
Synthesizing id index
perf record has started
myhost
[ perf record: Woken up 1 times to write data ]
overlapping maps in /usr/bin/hostname (disable tui for more info)
overlapping maps in /usr/lib/riscv64-linux-gnu/ld-linux-riscv64-lp64d.so.1 (disable tui for more info)
overlapping maps in /usr/lib/riscv64-linux-gnu/libc.so.6 (disable tui for more info)
overlapping maps in //anon (disable tui for more info)
overlapping maps in //anon (disable tui for more info)
overlapping maps in /usr/lib/riscv64-linux-gnu/libc.so.6 (disable tui for more info)
overlapping maps in /usr/bin/hostname (disable tui for more info)
overlapping maps in /usr/lib/riscv64-linux-gnu/ld-linux-riscv64-lp64d.so.1 (disable tui for more info)
symbol:unmap_start file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:unmap_complete file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:map_start file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:reloc_start file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:map_complete file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:reloc_complete file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:init_start file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:init_complete file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:lll_lock_wait_private file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:lll_lock_wait file:(null) line:0 offset:0 return:0 lazy:(null)
failed to write feature CPUDESC
failed to write feature NUMA_TOPOLOGY
failed to write feature MEM_TOPOLOGY
failed to write feature HYBRID_TOPOLOGY
[ perf record: Captured and wrote 0.117 MB perf.data (13 samples) ]

perf report -g --no-children
Available samples
13 cpu-clock

0 instructions

Sorry for a long log but maybe it can be useful.

Where I’m wrong?

How to get instructions and other HW events by perf record?

I’m trying to find some workaround but still doesn’t find suitable. Ive tried to use raw event instead of predefined.

Dispite that maybe it can be useful for someone.

So, I’ve found some list of pmu events for sifive.

Directly these events doesn’t work too but If i get sum of all Event Code it return 3ffff00.

c=grep EventCode /tmp/1| sed -E "s/\"//g; s/,//g; s/^.*: //g; s/0x//g"| paste -s -d "+";

echo "obase=16; ibase=16; $c" | bc -l

3FFFF00

I’ve checked it by this way

perf stat -v -e r3ffff00,instructions /usr/bin/hostname
Using CPUID 0x489-0x8000000000000008-0x6220425
Control descriptor is not initialized
myhost
r3ffff00: 891047 987000 987000
instructions: 908354 987000 987000
Performance counter stats for '/usr/bin/hostname':
891047 r3ffff00
908354 instructions

This event code has a not bad correlation with instruction in perf stat mode.

It’s the reason why i think that it works.

And after that i had a hypothesis that if i directly set this event code for perf record it can resolve my problem. By the set event code in direction manner, as understand, perf record works in some another way.

It works

perf record -v -g -c 10000 -e r3ffff00 /usr/bin/hostname

perf report -g --no-children
Samples: 109  of event 'r3ffff00', Event count (approx.): 1090000
  Overhead  Command   Shared Object                Symbol
+    4.59%  hostname  ld-linux-riscv64-lp64d.so.1  [.] _dl_lookup_symbol_x
+    3.67%  hostname  ld-linux-riscv64-lp64d.so.1  [.] _dl_relocate_object
+    3.67%  hostname  ld-linux-riscv64-lp64d.so.1  [.] strcmp
+    1.83%  hostname  ld-linux-riscv64-lp64d.so.1  [.] do_lookup_x
+    1.83%  hostname  [unknown]                    [k] 0xffffffff800bf29e
+    1.83%  hostname  [unknown]                    [k] 0xffffffff800c1e6a
+    1.83%  hostname  [unknown]                    [k] 0xffffffff800c1e7a
+    1.83%  hostname  [unknown]                    [k] 0xffffffff8020230a
+    1.83%  hostname  [unknown]                    [k] 0xffffffff80248034
+    1.83%  hostname  [unknown]                    [k] 0xffffffff806167d2
+    0.92%  hostname  [kernel.kallsyms]            [k] 0x0000507f80709be4
+    0.92%  hostname  ld-linux-riscv64-lp64d.so.1  [.] __GI___tunables_init
+    0.92%  hostname  ld-linux-riscv64-lp64d.so.1  [.] _dl_hwcaps_split
+    0.92%  hostname  libc.so.6                    [.] __ctype_init
+    0.92%  hostname  libc.so.6                    [.] __internal_atexit
+    0.92%  hostname  libc.so.6                    [.] strlen
+    0.92%  hostname  [unknown]                    [k] 0xffffffff800065ae
+    0.92%  hostname  [unknown]                    [k] 0xffffffff8000e124

My updated questions are:

  1. Why it doesn’t work by -e instructions?

  2. how to get other events in perf record?

The event “instructions” use minstret counter and the subevents from instruction.json file use mhpmcounters.
With the Sscofpmf extension, the mhpmcounter supports overflow interrupt (sampling) capability. Fixed counters (mcycle, minstret) don’t support sampling capability.

We can record instructions sub events using comma separated list as below and we are able to get the record report.
$ perf record -vvv -g -c 1000 -e integer_load_retired,integer_store_retired,system_instruction_retired --call-graph dwarf /usr/bin/hostname
$ perf report -g --no-children

If we want to record all the subevents, we need to provide a comma separated list.

Hi, pinkesh
I’ve found some workaround for this.
For instructions - r3ffff00
For cpu-cycles - as understand the best choice only cpu-clocks
As understand now SiFive P550 has some hardware limitations, so in generally it’s a suitable workaround in the current environment.