Poor Dhrystone performance

Yep, I followed the methodology described in the freedom-e-sdk README.md, using gcc-6.1.0.

Thanks. I just wanna confirm the CoreMark/MHz score are different on HiFive1 and Arty FPGA.

What is the fequency of your Freedom E300 board configured?

I am using a HiFive 1 with default setting (260MHz).

I played with several compiler options on my Hifive 1 card.

Thanks for the frequency info.
So the CoreMark/MHz can even achieve 2.8 (728/260) on SiFive1, better than 2.73 Coremark/MHz described in spec. (https://www.sifive.com/products/hifive1/)

I am getting a poor dhrystone performance (50000) on both hifive and arty boards but a good performance for coremark (738). I went though the comments in this thread but since January dhry_stubs.c code has changed along with init.c I believe.
I have played with the compiler flags as mentioned in one of the comments above but didn’t give any result for me.

I am seeing the same problem with dhrystone. I ran this about 6 weeks ago and was scoring 740740 @ 269MHz. (1.57DMIPs/MHz) which seemed to tie in with others.

I had a crash this week and reinstalled everything, now I am only getting 45454. Looks like something has chnaged

As 45454/740740 * 269 = 16.5, one might imagine that it’s because you’re running the board with the default 16 MHz crystal clock now?

It could well be. I assume this is set in the init.c file? I used the “standard” init.c that was downloaded from GitHub / getting started with HiFive1.

Can you point me to the one that I should use please?

My copy is somewhat old but has in bsp/env/freedom-e300-hifive1/init.c:

static void use_default_clocks()
{
  // Turn off the LFROSC
  AON_REG(AON_LFROSC) &= ~ROSC_EN(1);

  // Use HFROSC
  use_hfrosc(4, 16);
}

The functions available are described in bsp/drivers/fe300prci/fe300prci_driver.h though I think the comments could be a little more useful.

Thanks for this. I have the same code in my init.c. Its weird, as I am getting the “correct” values for Coremark, which I assume uses the the same init.c file to set up clocks.

And looking in git, that stuff hasn’t changed since February 2.

Looks like you should use

PRCI_set_hfrosctrim_for_f_cpu(uint32_t f_cpu, PRCI_freq_target target )

where f_cpu is the approximate frequency you want and target is one of PRCI_FREQ_CLOSEST or PRCI_FREQ_UNDERSHOOT.

Except that doesn’t set the dividers for the SPI, or I suppose for the UART. Which is bad. Some of that is done in use_pll() in init.c, but that’s got a whole load of arguments and it’s not documented at all. Or how you use the two functions together, or whether you should


Thanks for the reply. I am still not sure why my coremark numbers are OK and my dhrystone not, I assume they are using the same clock set up.

I deleted everything and started with a fresh install. I followed this

The code reports that its running at 280MHz and scoring 47619.

Which optimization option are you using for dhrystone.
As I reported on Jan 2nd on this thread above, -O or -Os gave us very poor performance.

-Os (-O also) gives about 20-times poor performance (I cannot explain how can this happen
)

Sorry, I have not checked, I am using this “straight out of the box”; i.e the code, Makefiles etc are straight off the GitHub site, I was assuming they would be set correctly. I will check

I removed -0s from the Makefile, still no change

No -Os are same. Try -O2, for example.

I removed -0s and -02, still no joy.

Something that will slow things down extremely is if there is const static data (e.g. a string or array of data) which the code reads in inner loops, and it gets read from the SPI flash every time – which takes more than 1 us per access (doesn’t matter too much whether byte, short, word). If the compiler thinks the data is writable it will copy it to SRAM at program startup and go far faster (if it fits in 16 KB, obviously). Alternatively, if optimisation is such that constants are formed in code using load {upper} immediate then the code will also run fast.

I guess I was just expecting an “out of the box experience” where I could download the code and example from GitHub and it would all run without having to change makefiles/compiler options etc.

It’s a microcontroller with code in flash, not a Linux machine.

No different really to an AVR or other machine where the program code and constant data are in a totally different address space to RAM, and loads from program space require special instructions caused by special progmem attributes on the declarations.