Float compiler sqrt

It seems the riscv gcc compiler is not aware of sqrt support defined in ISA.
If I do a float x = sqrtf(a); // Where a is a float
It generates over 1000 lines of code to solve this. And the sqrt op is never called.

If I do float x = a* 1.2345f;
at the same pos it is only a couple of lines with float op mult.

I’m not seeing that.

I tried compiling…

#include <math.h>

float af(float n){return sqrtf(n);}
double ad(double n){return sqrt(n);}

and got quite a bit of code, but it does include sqrt instructions

bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-gcc -O -c sqrt.c
bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-objdump -d sqrt.o

sqrt.o:     file format elf64-littleriscv

Disassembly of section .text:

0000000000000000 <af>:
   0:   1101                    addi    sp,sp,-32
   2:   ec06                    sd      ra,24(sp)
   4:   a422                    fsd     fs0,8(sp)
   6:   58057453                fsqrt.s fs0,fa0  <==========
   a:   f00007d3                fmv.w.x fa5,zero
   e:   00102773                frflags a4
  12:   a0f517d3                flt.s   a5,fa0,fa5
  16:   00171073                fsflags a4
  1a:   e799                    bnez    a5,28 <.L4>

000000000000001c <.L1>:
  1c:   20840553                fmv.s   fa0,fs0
  20:   60e2                    ld      ra,24(sp)
  22:   2422                    fld     fs0,8(sp)
  24:   6105                    addi    sp,sp,32
  26:   8082                    ret

0000000000000028 <.L4>:
  28:   00000097                auipc   ra,0x0
  2c:   000080e7                jalr    ra
  30:   b7f5                    j       1c <.L1>

0000000000000032 <ad>:
  32:   1101                    addi    sp,sp,-32
  34:   ec06                    sd      ra,24(sp)
  36:   a422                    fsd     fs0,8(sp)
  38:   5a057453                fsqrt.d fs0,fa0  <==========
  3c:   f20007d3                fmv.d.x fa5,zero
  40:   00102773                frflags a4
  44:   a2f517d3                flt.d   a5,fa0,fa5
  48:   00171073                fsflags a4
  4c:   e799                    bnez    a5,5a <.L8>

000000000000004e <.L5>:
  4e:   22840553                fmv.d   fa0,fs0
  52:   60e2                    ld      ra,24(sp)
  54:   2422                    fld     fs0,8(sp)
  56:   6105                    addi    sp,sp,32
  58:   8082                    ret

000000000000005a <.L8>:
  5a:   00000097                auipc   ra,0x0
  5e:   000080e7                jalr    ra
  62:   b7f5                    j       4e <.L5>

Looks like it’s all just to enforce some strict math properties, because if I add -ffast-math then…

bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-gcc -O -c sqrt.c -ffast-math
bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-objdump -d sqrt.o

sqrt.o:     file format elf64-littleriscv

Disassembly of section .text:

0000000000000000 <af>:
   0:   58057553                fsqrt.s fa0,fa0
   4:   8082                    ret

0000000000000006 <ad>:
   6:   5a057553                fsqrt.d fa0,fa0
   a:   8082                    ret

Hi bruce,
Thanks for your reply!

I’m using riscv-none-embed-gcc.exe -march=rv32imaf -mabi=ilp32f -O3 -ffreestanding -nostartfiles -lm
not riscv64.
Do you get the same result using 32 and not support for double?

I do if I remove the -ffreestanding. Which seems like a perverse result. Without d it of course calls a runtime function for the double version.

bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-gcc -march=rv32imaf -mabi=ilp32f -O3  -nostartfiles -ffast-math  -c sqrt.c
bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-objdump -d sqrt.o | expand

sqrt.o:     file format elf32-littleriscv

Disassembly of section .text:

00000000 <af>:
   0:   58057553                fsqrt.s fa0,fa0
   4:   00008067                ret

00000008 <ad>:
   8:   00000317                auipc   t1,0x0
   c:   00030067                jr      t1

Thanks Bruce,
Now I got a good result, by removing -ffreestanding and adding -ffast-math.
However I still think that the previous result is a little worrying…
Do you know where I should report these kind of things?

You can draw the attention of @jimw to gcc and binutils stuff. He’s the expert on them here, and submits patches for them.

Perhaps Jim will be able to explain the logic behind this result.

By default, the compiler requires that a call to fsqrt sets errno. The sqrt instruction will not set errno, but is faster than the sqrt call. So we do some arithmetic and comparisons to determine if we can use the fsqrt result or if we have to call the sqrt function. That gives some extra code, but gives the best result as fast as possible.

If you want to use fenv instead or errno for checking errors, you can use the -fno-math-errno option, and then you just get a single fsqrt instruction.

-ffast-math will also work, as this implies -fno-math-errno along with a bunch of other stuff.

-ffreestanding disables all builtin functions. With __builtin_sqrtf disabled, we can no longer inline expand the sqrtf call to a fsqrt instruction. So there should always be a sqrtf call with -ffreestanding.

There is a related problem in newlib where we haven’t optimized the math library routines to be aware of RISC-V FP instructions yet. There is a riscv-newlib issue for this.

I don’t understand the comment about 1000 lines of code, unless maybe you are talking about the newlib sqrt implementation, which is indeed about 1000 lines of code, but that is a known problem with a known solution, that we just haven’t had a chance to implement yet.

I should probably clarify that -ffreestanding disables the optimization that converts sqrtf to __builtin_sqrttf, but __builtin_sqrtf is still available and can be called directly if you want. This is normally not recommended, but might be useful if you need -ffreestanding for other reasons.

Hi @jimw,

thanks for your reply.

With 1000 lines I mean the following:

if I have the following simple main function:
int main(void)
volatile int* pV1 = (int*)0x08000000;
volatile int* pRes = (int*)0x08000004;
float f = float(*pV1);
f = sqrtf(f);
*pRes = int(f);
return 0;

I build this with -march=rv32imaf -mabi=ilp32f -O3 -ffast-math -nostartfiles -lm then the result of the program is 10 instructions. And looks as expected.

If I remove -ffast-math flag the result is 1200 instructions for the program.

That is mostly the newlib sqrtf function as mentioned, which we can shorten with a little work.

Though I see another problem I missed when I first looked at this. In addition to supporting errno, newlib also supports matherr. Try “man matherr” for info on that. Unfortunately, the interface to matherr requires doubles, and if you have a f only target that means we need soft float routines to convert float to double. Matherr wasn’t designed for embedded targets, and a newlib redesign would be required to fix this, and maybe a deliberate choice to deviate from the relevant standards.

There is also a trivial bug in sqrtf. It is using “0.0/0.0” to create a NaN. That pulls in a soft-float 64-bit divide routine. We can fix that by changing newlib to use “0.0f/0.0f” here. There are multiple files that have this problem. I also see a few places that use x/0.0 to generate an infinity that should be x/0.0f. With that patch, the program is about 60% of its original size. I will work on finishing and submitting this patch.

Fixing riscv-newlib issue #40 would give us about another 5% reduction. A solution for the matherr problem would give us another 20% reduction. So we can get this to about 1/3 of the original size with some changes to the newlib math library.

Hi @jimw,
what I don’t understand is why the newlib sqrtf is called when there is a sqrtf support in the cpu.
And why removing -ffast_math calls newlib sqrtf instead the one supported in the cpu.

So my conclusion is that I can’t use the compiler without -ffast_math

The CPU’s sqrt function (as with all other math operations on RISC-V) is designed to be as fast as possible, for use with -ffast-math, and to not cause traps or interruptions of the program flow.

As another example, on many CPU types integer division by zero will cause a trap. On RISC-V division by zero returns an all-1s pattern. If division by zero is possible in your program and you want something else to happen in that case then you (or your compiler) needs to test the divisor itself and branch to appropriate code.

It’s the same with sqrt. The IEEE standard and the C and Posix standards require certain behaviours from sqrt. The raw RISC-V sqrt instruction does not provide those behaviours. If you want them then the compiler has to insert extra code. If you don’t want/need them then use -ffast-math.

In fact you’ll find -fno-math-errno is sufficient in this case to get just a sqrt instruction. There are other things enabled by -ffast-math that you may or may not want (but you probably do).

1 Like

As Bruce mentioned, sqrt() is supposed to set errno on error, and the fsqrt instruction does not. Try running “man sqrt” and reading the ERRORS section. You can choose to use the fenv.h functions to check for errors instead, in which case you can use the -fno-math-errno option to use the fsqrt instruction directly. See “man math_error” for info on the fenv.h style error checking.