Function alignment with gcc

Hi,
while working on my own FPGA optimized RISC-V implementation from time-to-time I update the RISC-V toolchain.
BTW: I really enjoyed to read the “All Aboard” blog post series about the toolchain, it helps alot

Today I updated riscv-tools to the current master branch with also updated gcc to commit 65cb174.
Because my design is RV32IM only I usally only compile a 32bit toolchain
Everythings seems to work fine, but when creating a listing of the linked elf I notice that functions seem to be padded/aligned now (look at the “unimp” line in the listing below). It seems that the compiler and/or linker align all functions to a modulo 8 ==0 address. I doesn’t harm anything, besides consuming memory. But is there a possibility to switch it of? I have already tried fno-align-functions, but this doesn’t help.

00010160 <_atoi_r>:
   10160:	00a00693          	li	a3,10
   10164:	00000613          	li	a2,0
   10168:	00d0606f          	j	16974 <_strtol_r>
   1016c:	0000                	unimp
	...

00010170 <__errno>:
   10170:	000677b7          	lui	a5,0x67
   10174:	bb87a503          	lw	a0,-1096(a5) # 66bb8 <_impure_ptr>
   10178:	00008067          	ret
   1017c:	0000                	unimp
	...

Regards
Thomas

What command did you use to build your toolchain?

Looks like the bug is in this patch, which may have a different hash on your system

60cda8de81dc (“RISC-V: Avoid emitting invalid instructions in mixed RVC/no-RVC code”)

it’s safe to revert for you, as you don’t have RVC so you won’t be mixing in RVC code. I’l try to come up with a proper fix.

While we’re on the subject of alignment … can we get the stack alignment reduced in cases where it’s not necessary?

There seems to be support for reducing RV32 stack alignment at least from 16 bytes to 8, even at public interfaces: https://groups.google.com/a/groups.riscv.org/forum/#!msg/sw-dev/SFcqfIrRhQc/TL4IkMqWDQAJ

David Chisnall argues that the alignment at public interfaces should not change but that “a decent toolchain” can do whatever it wants for functions that are not externally visible, or even for all functions in embedded uses (i.e. whole program including runtime libraries linked into a single blob): https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/3CCD5990-3CF4-4D49-94FB-7DD0202F1FAF%40cl.cam.ac.uk

It seems to me that this gives riscv32-unknown-elf-gcc license to drop down to 4 byte stack alignment when compiling for any -march that doesn’t include the D float extension.

What do you think?

I’m using
../configure --prefix=/opt/riscv --with-arch=rv32i --with-abi=ilp32
and then sudo make

In the meantime I have noticed that my new compiler build has also problems with -O3 option. They must be edge cases, I noticed strange bugs in my project, which disappear when I build with -O2 or Os or with my old toolchain.

I also crosschecked that the same behavior occurs when running the code in Spike, so it is not related to my HW design.

Unfortunately I had not the time yet to track it down to the assembly level, the piece of code that fails is not trivial and not written by me.

For the moment I reverted back to my “old” toolchain.

newlib will not work with 4 byte alignment as soon you use doubles somewhere in your code (I asume the same will be true for glibc)

How does that happen, on a machine with no FP hardware and no other 64 bit registers?

With soft float :slight_smile:
Believe me, I learned it that “hard way” a while ago when my system startup code initialized SP to a 4-byte aligned address. Latest, when you try a
printf("%f",....)
you will be in trouble.

By what mechanism?

I’m not sure what you are exactly asking. I can only say that gcc and/or newlib implement the whole va_arg things (va_list structure, va_start, va_arg, va_end macros) in a way that it needs 8 byte alignment on the stack when using doubles. The compiler also automatically converts every float passed as argument to a double.
If you take a look in gcc’s stdarg.h you can see the most definitions map to some __builtin* definitions, so its is in some way “hard-coded” in the compiler.

But I think we are really moving away from my initial topic…

I’m saying that there is no instruction in the CPU that requires 8 byte alignment.

I suppose it’s possible that some software could decide to AND addresses given to it with ~0x7, or store addresses divided by eight and later get back the (hopefully) original by multiplying by eight.

But that’s extremely perverse software! At least if you’re not writing a LISP runtime or something, and want to use the lowest three bits as type tags. I’d assume Newlib isn’t doing something like that.

Hi, I reverted the patch, and indeed it removed the issue.
Thanks.

This is basically what happens. x = va_arg(ap, double); (ap as va_list) generates code similar to ap = (ap + 4) & ~7; x = *(double*)ap; ap += 8; (ap as int)

Previous example where this came up: long long int vararg on 32-Bit machine with stack pointer divisible by 4 · Issue #63 · riscv-collab/riscv-gcc · GitHub

This seems like a bug in Newlib. It should use alignof(double)-1 not a literal 7.

This works on riscv32-unknown-elf-gcc (the Newlib toolchain), and prints (currently) 8. If va_arg used alignof correctly then the alignment of double could be safely changed to 4 on -march=rv32imafc and lower.

#include <stdio.h>
#include <stdalign.h>

int main(){
  printf("%lu\n", alignof(double));
}

AFAIK stdarg.h is part of GCC, not newlib.

If you want 4-byte stack alignment, the ABI needs to be changed to not align doubles to even registers / 8-byte stack positions.

Yes I know. The argument being made against me is that the ABI can’t be changed because va_arg assumes 8 byte alignment for doubles, instead of checking. Which has got to be a bug.