Benchmarking Security in the HiFive1

dkhayes117 · November 8, 2020, 1:53am

No the other was computation only. I rewrote the program to check my syscall overhead. Next, Im going to time a context switch assuming that pmp would need to be reconfigured.

dkhayes117 · November 11, 2020, 2:36am

My original test was using the debug version, I ran the sieve program again as a release version with optimizations.

cycles: 186673
instructions: 130662
CPI: 1.429

This is like 30 times as fast as before! I had no idea that using a release version (takes longer to compile) would affect it that much. I need to test my system call again with the a release version.

nick.knight · November 11, 2020, 4:20am

I haven’t been keeping up with this thread… are you talking about Freedom-E-SDK’s debug versus release CONFIGURATIONs? (If not, please disregard the following.)

If so, IIRC, debug uses -O0 whereas release uses -O2. Consider trying -O3, or even -Ofast. Take a look at the GCC docs to get an idea what these mean.

bruce · November 11, 2020, 6:17am

omg. There’s never any good reason to use -O0. Gcc makes just awful code with that – and it’s not even easier to debug. Always at least -O1 !

dkhayes117 · November 11, 2020, 11:17am

I’m using the Rust compiler cargo which is built on LLVM

dkhayes117 · November 14, 2020, 8:28pm

I got my system calls down to 314 instructions at 2290 cycles (~7.16us @ 320MHz). The context switch as far as re-configuring the PMP registers came in at about 5k cycles which is ~16us @ 320 MHz. I don’t really need a context switch in my programming, and a real switch would be more involved then just changing the PMP configs and addresses. I was just curious.

bruce · November 14, 2020, 10:57pm

That seems like a surprising mismatch between instructions and cycles.

I wonder if the code is reading constants out of the SPI flash – including such things as virtual function dispatch tables, if Rust has those. It’s really really slow to read data from the flash as there is no dcache.

I don’t know how Rust sets things up … or Metal for that matter … but in the old sdk there is a setting at line 84 in https://github.com/sifive/freedom-e-sdk/blob/v1_0/bsp/env/freedom-e300-hifive1/init.c:

// Div = f_sck/2
SPI0_REG(SPI_REG_SCKDIV) = 8;

With the 8 setting and a 256 MHz main clock it runs the SPI at 32 MHz which is excessively slow for loading code from flash to icache, but more importantly for loading constant data. The flash spec says it can run at 133 MHz. For my own use at 256 MHz I change the setting to 2 (as in the comment) to run the flash at 128 MHz. Maybe for 320 MHz you’d need to use 3, giving 107 MHz. Also, I think the flash is quad SPI but is only being run at single.

If you can find where this is set up in Rust then maybe you can experiment with this and it might make your code much faster.

Or else make sure you’re not doing any data loads from the flash memory range by moving any frequently accessed constant data to RAM – you can get the linker to do this, or else simply declaring it non-constant will do it too.

dkhayes117 · November 16, 2020, 2:32am

I believe the cycle disparity is definitely flash/cache related, but really it is above my pay-grade at this point. There is a rust e310x and e310x-hal crate, but most of it looks like something that came from the Roswell crash site. https://github.com/riscv-rust/e310x/blob/master/src/common/qspi0/sckdiv.rs
I might get to that level one day For now, it works the way I need it to.

dkhayes117 · November 21, 2020, 4:36pm

I may have not run enough iterations when timing my system calls. I rewrote the program slightly, and made sure I used the release profile. 1000 iterations came back as 134 instructions and 194 cycles. That makes much more sense

I also added a i2c temp sensor, PCT2075, and read the temperature several times in machine mode then from u mode with system calls. With optimizations, u-mode came back as only 100 cycles more per read

bruce · November 21, 2020, 10:29pm

That is much more in line with what I’d expect.

Measuring cycles is tricky. You have to make sure to pre-run all the code you’re measuring – including the measuring code itself.

Topic		Replies	Views
PMP registers and User Mode HiFive1 Rev B	28	5245	October 2, 2020
HPM instruction cache miss counter HiFive1 Rev B	7	2966	February 9, 2021
Low benchmarking scores HiFive1 Rev B	6	3183	June 26, 2018
Wherstone benchmark on the HiFive1 HiFive1 Rev B	4	4375	January 13, 2017
HiFive1 Rev.B Benchmark HiFive1 Rev B	8	3171	February 21, 2020

Benchmarking Security in the HiFive1

Related topics