Benchmarking Security in the HiFive1

dkhayes117 · October 9, 2020, 5:52pm

@bruce For part of my project, I need to benchmark the difference between running in machine mode versus user mode. It seems that there should be a performance hit from PMP checking while in u mode. My first thought was to run a sorting algorithm once in m mode and once in u mode and compare the instruction counts using the mhpmcounter and hpmcounter registers. Does this seem like a correct and simple way to compare the two?

bruce · October 11, 2020, 12:02pm

Sorting could work, but it might be hard to find a reasonable algorithm that takes long enough to run with maximum 16 KB of data.

My own counting primes benchmark takes quite a but of time without using a huge amount of RAM: http://hoult.org/primes.txt

I’m pretty sure I know what your results will be, but I won’t spoil your fun It would certainly be interesting if I’m wrong.

dkhayes117 · October 12, 2020, 4:03pm

Thanks for link, I will port this concept into rust and give it a shot.

dkhayes117 · October 13, 2020, 3:08pm

I haven’t tested this on the HiFive1B yet, but it should do what I need. (I won’t be printing the values out)

fn main() {

    let mut primes: [usize;1000] = [0;1000];

    for i in 2..primes.len()-1 {
        primes[i] = i;
    }

    for i in 0..primes.len() {
        let factor = primes[i];
        if factor != 0 {
            sieve(&mut primes, factor);
        }
    }

    for i in 0..primes.len() {
        if primes[i] != 0 {
            println!("{}", primes[i])
        }
    }
}

fn sieve(primes: &mut [usize], factor: usize) {
    for i in 0..primes.len() {
        let value = primes[i];
        if value != 0 && value != factor {
            if value % factor == 0 {
                primes[i] = 0;
            }
        }
    }
}```

dkhayes117 · November 4, 2020, 1:44am

@bruce I ran a sieve on a 1000 element array with 100 iterations in user mode and in machine mode. Here are my results

    Total Instructions: 49176310
Avg Cycle Count M-Mode: 75530226
avg Cycle Count U-Mode: 76465046

1.238% performance loss

bruce · November 4, 2020, 2:02am

What are you actually timing? Does it include the I/O? I don’t see any timing code in what you posted.

Also, the whole point of a “sieve” algorithm is that you don’t need to do any division operations.

dkhayes117 · November 4, 2020, 2:27am

I have never done this, so I wouldn’t be surprised if I did something wrong. I didn’t use mtime or time registers, just the mcycle/cycle and minstret/instret. The counts are to complete the prime sieve only. No division operations? Do you mean dividing the cycle count by clock frequency? Also, I planned on timing the UART console printing as a separate test. The only difference being the system call process.

bruce · November 4, 2020, 7:05am

Division.

dkhayes117 · November 4, 2020, 2:42pm

Haha, oh the modulus. Maybe not a true sieve, but it did the job. Are the results surprising to you, or is it what you expected? Again, this timing was around the memory accesses of the sieve only.

bruce · November 4, 2020, 9:14pm

I don’t know any reason U mode would run slower than M mode for pure computation.

dkhayes117 · November 4, 2020, 9:39pm

Would the pmp checks not affect it?

bruce · November 5, 2020, 6:07am

I would be shocked if PMP caused memory accesses to take extra clock cycles. Certainly a CPU designer could do that, but I’d expect they are striving not to.

dkhayes117 · November 6, 2020, 12:27am

I think there is a flaw in my methodology. I ran each test as seperate programs. I’ve been told the timing of cache accesses could vary due to this. I got advice to run both umode and mmode tests in one binary and throw away the first iteration. I will post the updated results.

dkhayes117 · November 6, 2020, 2:04am

Results are in. Below is the average of 100 iterations with the program running through the sieve once in each mode before cycles are started to be recorded.

M-Mode Cycles: 6175442 
U-Mode Cycles: 6172695
0.044% difference = negligable

bruce · November 6, 2020, 10:19pm

Yes, you absolutely don’t want to count cycles waiting for code to be loaded from SPI flash the first time it is used.

So that explains why the cycles were close to twice the number of instructions executed before, which is pretty unusual.

But now I’m even more confused how 49176310 instructions can run in 6175442 cycles.

dkhayes117 · November 6, 2020, 11:22pm

I made the array smaller in the compiled program is why

bruce · November 7, 2020, 2:15am

So how many instructions now?

dkhayes117 · November 7, 2020, 5:02pm

Total instructions: 3997027
CPI ~1.545

EDIT:
I also just benchmarked the stock bootloader. I know that it sends out some commands to the esp32 chip, but not sure what else it may be doing. The average boot time was 3.157 seconds, I set a gpio pin high then reset the board and measure how long before the pin goes high again.

dkhayes117 · November 8, 2020, 1:06am

My u-mode system call setup takes 339 instructions more than m-mode calling functions directly (where m-level privilege is required).

bruce · November 8, 2020, 1:28am

So you weren’t timing just the computation, but also some “system calls”?

Topic		Replies	Views
PMP registers and User Mode HiFive1 Rev B	28	5270	October 2, 2020
HPM instruction cache miss counter HiFive1 Rev B	7	2973	February 9, 2021
Low benchmarking scores HiFive1 Rev B	6	3201	June 26, 2018
Wherstone benchmark on the HiFive1 HiFive1 Rev B	4	4380	January 13, 2017
HiFive1 Rev.B Benchmark HiFive1 Rev B	8	3182	February 21, 2020

Benchmarking Security in the HiFive1

Related topics