Good morning,
I’m attempting to obtain total cycle counts for a few functions. To do so, I’ve been reading the mcycle CSR before a function call, reading the mcycle CSR after a function call, and calculating the difference between the two values.
The resulting cycle counts were greater than I expected. Therefore, I decided to try this method with a simple function:
#include <stdint.h>
volatile uint32_t a;
volatile uint32_t b;
volatile uint32_t time5;
int main(void)
{
for(;;) {
asm volatile ("csrr %0, mcycle" : "=r" (a));
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("nop");
asm volatile ("csrr %0, mcycle" : "=r" (b));
time5 = b - a;
a = 0;
b = 0;
asm volatile ("nop");
}
return 0;
}
Assuming each nop takes one cycle to execute, this should yield time5 as 20 (plus the number of cycles to read mcycle). Rather than 20, the value yielded was 37.
I also repeated this method with 20000 nop statements (no loops). The results showed that the value in time5 was 2671265, not 20000.
I checked the validity of the cycle counter by replacing the timing and mcycle read statements with the following statements:
(*(volatile uint32_t *) (((0x10012000UL)) + ((0x0C)))) |= (0x1 << 16) ;
(*(volatile uint32_t *) (((0x10012000UL)) + ((0x0C)))) &= ~((0x1 << 16)) ;
These statements pull pin 0 (marked IO0 on the board) high and low, respectively. With 20000 nop statements between them, I used an external oscilloscope to measure the length of each high pulse. I found this value to be approximately 41 milliseconds. In other words, 20000 nop instructions took 41 milliseconds to execute. The inverse of this value, 24.4Hz, was multiplied by 2671265 cycles. The product was 65.2MHz, which is the approximate value of the given clock speed.
As a result, I believe mcycle is returning valid cycle counts. Therefore, it would appear that something is wasting clock cycles, although I’m unsure what it could be.
Other things I tried:
-
Switched optimization levels from -O0 to -O3
-
Used gdb to step by instruction (via stepi)
-
Set a watchpoint in gdb on the $pc (program counter) register. It did not jump to any interrupts or traps.
-
Set breakpoints on functions within the following: init.c syscall.c drivers_sifive/plic.c None were tripped while within the main method.