Indeed @tincman is right, the attribute in the C file does not apply.
By replacing .text
by .section .itim
in get_cycles()
I ensure that it runs in the ITIM as when disassembling the resulting .elf file I get
Disassembly of section .itim:
08000000 <run_in_itim>:
8000000: 1101 addi sp,sp,-32
8000002: ce22 sw s0,28(sp)
8000004: 1000 addi s0,sp,32
8000006: fea42623 sw a0,-20(s0)
800000a: fec42703 lw a4,-20(s0)
800000e: fec42783 lw a5,-20(s0)
8000012: 02f707b3 mul a5,a4,a5
8000016: 853e mv a0,a5
8000018: 4472 lw s0,28(sp)
800001a: 6105 addi sp,sp,32
800001c: 8082 ret
...
08000020 <getcycles>:
8000020: b80025f3 csrr a1,mcycleh
8000024: b0002573 csrr a0,mcycle
8000028: 8082 ret
However I get the following output in this case: [2172, 46, 46, 46, 46, ..., 46]
. So there is still a huge gap for the first measurement. Moreover I get 46 cycles instead of 40 when get_cycles()
is not running in the ITIM.
By investigating further, it seems that the problem is not linked to ITIM or DTIM usage since when the functions are not put in the ITIM (i.e. by removing __attribute__ ((noinline)) METAL_PLACE_IN_ITIM
and using the symbol .text
for get_cycles()
) I get the following disassembly file
Disassembly of section .text:
20010380 <run_in_itim>:
20010380: 1101 addi sp,sp,-32
20010382: ce22 sw s0,28(sp)
20010384: 1000 addi s0,sp,32
20010386: fea42623 sw a0,-20(s0)
2001038a: fec42703 lw a4,-20(s0)
2001038e: fec42783 lw a5,-20(s0)
20010392: 02f707b3 mul a5,a4,a5
20010396: 853e mv a0,a5
20010398: 4472 lw s0,28(sp)
2001039a: 6105 addi sp,sp,32
2001039c: 8082 ret
2001039e <main>:
2001039e: 7171 addi sp,sp,-176
200103a0: d706 sw ra,172(sp)
200103a2: d522 sw s0,168(sp)
200103a4: 1900 addi s0,sp,176
200103a6: 4789 li a5,2
200103a8: fef42423 sw a5,-24(s0)
200103ac: 05b00513 li a0,91
200103b0: 20d9 jal 20010476 <putchar>
200103b2: fe042623 sw zero,-20(s0)
200103b6: a899 j 2001040c <main+0x6e>
200103b8: 2885 jal 20010428 <getcycles>
200103ba: fea42023 sw a0,-32(s0)
200103be: feb42223 sw a1,-28(s0)
200103c2: fe842503 lw a0,-24(s0)
200103c6: 3f6d jal 20010380 <run_in_itim>
200103c8: fca42e23 sw a0,-36(s0)
200103cc: 28b1 jal 20010428 <getcycles>
200103ce: 86aa mv a3,a0
200103d0: 872e mv a4,a1
200103d2: fe042583 lw a1,-32(s0)
200103d6: fe442603 lw a2,-28(s0)
200103da: 40b687b3 sub a5,a3,a1
200103de: 853e mv a0,a5
200103e0: 00a6b533 sltu a0,a3,a0
200103e4: 40c70833 sub a6,a4,a2
200103e8: 40a80733 sub a4,a6,a0
200103ec: 883a mv a6,a4
200103ee: fcf42823 sw a5,-48(s0)
200103f2: fd042a23 sw a6,-44(s0)
200103f6: fd042783 lw a5,-48(s0)
200103fa: 85be mv a1,a5
200103fc: 84818513 addi a0,gp,-1976 # 80000ae8 <__metal_driver_vtable_fixed_clock+0x8>
20010400: 280d jal 20010432 <iprintf>
20010402: fec42783 lw a5,-20(s0)
20010406: 0785 addi a5,a5,1
20010408: fef42623 sw a5,-20(s0)
2001040c: fec42703 lw a4,-20(s0)
20010410: 47fd li a5,31
20010412: fae7d3e3 bge a5,a4,200103b8 <main+0x1a>
20010416: 85018513 addi a0,gp,-1968 # 80000af0 <__metal_driver_vtable_fixed_clock+0x10>
2001041a: 22f1 jal 200105e6 <puts>
2001041c: 4781 li a5,0
2001041e: 853e mv a0,a5
20010420: 50ba lw ra,172(sp)
20010422: 542a lw s0,168(sp)
20010424: 614d addi sp,sp,176
20010426: 8082 ret
20010428 <getcycles>:
20010428: b80025f3 csrr a1,mcycleh
2001042c: b0002573 csrr a0,mcycle
20010430: 8082 ret
which returns [2446, 38, 38, 38, ..., 38]
.
@jimw I’m not sure that a proper configuration of the ITIM/DTIM is the reason here. Actually it seems that the code runs faster when not put into the ITIM: what could explain that behavior?