I have been trying to benchmark some assembly that I am writing. To do this I am first calling the function multiple times to fill the instruction cache with it’s instructions. Then I get the cycle count using the following assembly cssr a0, mcycle; ret
.
Now here comes the weird part. If I have my assembly as an external function in C without parameters it takes approximately as long as expected (around 60 cycles). However if I add parameters it all of a sudden takes in the 2000 cycles. Comparing the objdump of the elf file doesn’t show anything worrying. The only difference is that it loads values from the stack.
To be clear I added the code below that runs in the expected time, and the code that takes longer.
This takes the expected 60 cycles.
#include <stdint.h>
#include <stdio.h>
extern uint32_t getcycles();
extern uint32_t dosomething();
int main() {
uint32_t oldcount, newcount, x;
unsigned char a = 10;
unsigned char b = 50;
uint32_t l;
getcycles();
dosomething();
getcycles();
dosomething();
getcycles();
dosomething();
getcycles();
dosomething();
getcycles();
dosomething();
getcycles();
dosomething();
oldcount = getcycles();
l = dosomething();
newcount = getcycles();
printf("This took %u cycles\n",newcount-oldcount);
return 0;
}
And without changing the assembly this takes more than 2000 cycles:
#include <stdint.h>
#include <stdio.h>
extern uint32_t getcycles();
extern uint32_t dosomething(unsigned char a, unsigned char b);
int main() {
uint32_t oldcount, newcount, x;
unsigned char a = 10;
unsigned char b = 50;
uint32_t l;
getcycles();
dosomething(a,b);
getcycles();
dosomething();
getcycles();
dosomething(a,b);
getcycles();
dosomething(a,b);
getcycles();
dosomething(a,b);
getcycles();
dosomething(a,b);
oldcount = getcycles();
l = dosomething(a,b);
newcount = getcycles();
printf("This took %u cycles\n",newcount-oldcount);
return 0;
}
Does anybody have any idea why this happens?
Kind regards,
mortalAmongstGods (mag)