I have created a multi-threaded processor using the FGMT (fine-grained multi-threading technique) where a different thread is executed at each clock cycle. To test it and demonstrate it’s performance improvement, i would like to compile some C code suitable for it (the complied code would be aware of the multithreading feature.
My understanding of FGMT is that each thread has its own PC and own set of registers. That means the compiled code is exactly the same as for a non-threaded processor.
You are absolutely correct, in FGMT, each thread has its own PC and set of registers.
However, if the compiler is aware of the number of threads, it will detect parts of the code that are independent and will “split” the code accordingly to the number of threads to execute in parallel.
e.g
int main(void)
{
int i = 0, j = 0, y = 0, z = 1;
for( i = 0; i < 20; i = i + 1 ){
z = z + i;
}
for( j = 0; j < 20; j = j + 1 ){
y = y + 2;
}
return 0;
}
}
Since the two ‘for’ loops are completely independent, the compiler is going to ensure that thread 1 is executing the code for the first loop while thread 2 will execute the code for the second loop. That means TWO jump instructions will ensure that thread 1 and 2 jump at the right instruction address(PC_thread1 and PC_thread2 respectively). In addition to that, a different sp (stack pointer) should be set for each pointer and gp (global pointer) should be aware of the multithreading feature so no erroneous changes occur.
So i am asking if the Freedom Studio has the ability to optimise the compiled code based on a given number of threads
With -Os or -O2 the gcc in Freedom Studio optimises your code to:
0000000000000000 :
0: 4501 li a0,0
2: 8082 ret
Perhaps you need a better example.
There is some support in gcc for auto-parallelization. I don’t know how well it works, or whether it is enabled in the RISC-V linux version. It isn’t for the embedded compiler I use ("-ftree-parallelize-loops=n")