I’m using a E21 Standard Core Trial, programmed into a Arty 100T, and I recently got interested in its performance. Its doc, SiFive E21 Manual v19.05, says:
“The pipeline has a peak execution rate of one instruction per clock cycle.”
Although the word peak is used, I still got amused with the results I got by reading mcycle and minstret.
One example: a memset of 128bytes took 516 instructions and 210879 cycles (!!!). It means the board is executing 0.0024 instructions per cycle. That must be wrong! I tried scaling up and down the tests, but the result kept consistent (no more than 0.003 IPC).
And finally, the technical details. I’m using the hardware performance monitoring from the board, which is described on its document too:
The mcycle CSR holds a count of the number of clock cycles the hart has executed since some arbitrary time in the past. The minstret CSR holds a count of the number of instructions the hart has
retired since some arbitrary time in the past.
On high level I’m doing this:
write_csr(mcycleh, 0); write_csr(mcycle, 0); write_csr(minstreth, 0); write_csr(minstret, 0);` [some stuff...] num_cycle = read_csr(mcycle); num_instr = read_csr(minstret);
Which looks to be correctly compiled to:
csrwi mcycleh,0 csrwi mcycle,0 csrwi minstreth,0 csrwi minstret,0 [some stuff...] csrr a0,mcycle csrr a1,minstret
Please, am I missing something? What should be the expected result here?