QEMU performance for U54-mc

Hi

I’m trying to figure out how can I measure a function’s performance using QEMU, since for what i’ve heard it’s not very cycle-accurate.

Right now I’m leaving one of the u54-mc’s harts running on a while loop as some sort of makeshift timer, but that is giving me weird results.
If i leave a single hart to execute the whole function by itself I get that it measures twice as much time to run than if I split the function between two harts but leave the second one waiting for the first to finish it´s part.

So I want to know what could be causing this difference, if it could be some QEMU or freedomstudio setting, or perhaps something else. Any help is appreciated.

QEMU is in no way intended to produce cycle-accurate or even speed representative results. QEMU’s purpose is to emulate RISC-V code as quickly as possible (at least in a way that allows QEMU to be portable to different host systems and the compromises that involves).

QEMU has no knowledge of any particular RISC-V CPU core or its performance characteristics. It has functional emulation of various sets of peripherals that might be found on particular boards, but that’s a different matter.

If you want cycle-accurate simulation then you need to do one of:

  • run the RTL for the system in a simulator such as verilator (and a few tens of kHz effective speed)
  • run the RTL for the system in an FPGA (at 30 to 100 MHz)

or

or

  • run on actual U54-mc hardware such as the HiFive Unleashed or Icicle.

Thanks for the clarification, I’ll try that