I’m using the HiFive Unleashed with the kernel from https://github.com/sifive/freedom-u-sdk and a Fedora file system.
I was testing a multi-threaded application and I saw that most of the time the application was taking X seconds to execute, but some times (maybe 1 over 5 to 10) it took about 9X seconds to execute.
I generated a small multi-thread/multi-job test to reproduce the issue, it uses “Hardware Performance Monitor” to gather more information.
Going through all the events that can be monitored, I believe that the problem is related to “Data cache/DTIM busy”. When the issue is visible, the ratio between the cycles and hpmcounter3, which is set to “Data cache/DTIM busy”, is hpmcounter3/cycles ~ 0.92 (approx).
I looked at the generated assembly by GCC, the code should be in a loop most of the time, doing lw, addiw, sw, setxt.w, bnez and bgtu, accessing data from the stack. My understanding is that fork and threads will use separate areas for their own stacks, so they should not invalidate any caching of the stack of other threads/jobs.
Does anyone have an idea why this could be happening or what else I could do to understand what’s going on?
The code that I’m using to test can be found here: https://github.com/fabriziocabaleiro/riscv-tests
I see the issue randomly, around 1 time over 5 to 10 executions.
The piece of code where the test should be most of the time is listed below. “counter”, “loops” and “i” are local variables, I normally set loops to 1000000000 and counter to 10000. Most of the time the loop doesn’t do anything else than increment and decrement a variable, 1 over 10000 (counter) reads the cycles, time, instret and event counter which are reported by the parent process or the initial thread.
1 for(i = 0; i < loops; i++)
3 if(counter == 0)
5 pta->i = i;
6 pta->coreid = sched_getcpu();
7 counter = counter_start;
8 asm("csrrs %0, " STR(CSR_CYCLE) “, zero\n”
9 "csrrs %1, " STR(CSR_TIME) “, zero\n”
10 "csrrs %2, " STR(CSR_INSTRET) “, zero\n”
11 "csrrs %3, " STR(MRW_MHPMCOUNTER3) “, zero\n”
12 : “=rm” (cycle), “=rm” (time), “=rm” (instret),
13 “=rm” (hpmcounter3));
14 pta->cycle = cycle - pcycle;
15 pta->time = time - ptime;
16 pta->instret = instret - pinstret;
17 pta->hpmcounter3 = hpmcounter3 - phpmcounter3;
18 pcycle = cycle;
19 ptime = time;
20 pinstret = instret;
21 phpmcounter3 = hpmcounter3;
23 counter -= 1;