Ld, sd alignment?

What is the requirement for the instruction "ld t0,a(0) by example ?

8 bytes, 4 bytes or no alignment for a0 ?

8 bytes. SiFive cores require natural alignment for all loads and stores. There is no misaligned/unaligned support, unless you write a trap handler to emulate it.

Some time ago i got curious about the impact of unaligned memory access and wrote a little asm program to do 0xfffffff reads/writes with ld, lw, lh, lb, sd, sw, sh and sb with different offsets and timed it with the “time” utility.

These are the results, all times are in seconds:
(The forum doesn’t support tables so i’ll use code tags :slight_smile: )

       ld     lw     lh    lb   sd     sw     sh    sb
offset time   time   time  time time   time   time  time
0        0.45   0.45  0.45 0.45   0.45   0.45  0.45 0.45
1      158.87 111.66 90.05 0.45 152.76 106.36 85.14 0.45
2      158.76 111.64  0.45 0.45 153.57 106.35  0.45 0.45
3      158.83 111.64 90.05 0.45 151.80 106.39 85.15 0.45
4      159.04   0.45  0.45 0.45 151.79   0.45  0.45 0.45
5      158.85 111.66 90.02 0.45 151.81 106.35 85.15 0.45
6      158.43 111.64  0.45 0.45 153.34 106.34  0.45 0.45
7      158.90 111.65 90.07 0.45 151.80 106.37 85.14 0.45
8        0.45   0.45  0.45 0.45   0.45   0.45  0.45 0.45

As you can see the impact is huge when it’s not aligned at the size of the read/write.

1 Like

Whoo! unaligned time grow by factor 350…

Thank you to share your result.

Yes, when you have no hardware support, M-mode firmware like OpenSBI will handle the alignment faults and emulate the accesses, so it’s always going to be orders of magnitude slower than having unaligned accesses supported in the hardware itself. Although they’re required to work in S-mode and above, you should just avoid them, especially in C where it’s undefined behaviour; use memcpy/memset and let the compiler inline an unaligned access if it decides that’s the best way to implement it.

I was trying to make sense of this huge impact of unaligned memory access, and with Jessica’s hint that OpenSBI handles this i found the actual code that handles this: opensbi/sbi_misaligned_ldst.c at master · riscv-software-src/opensbi · GitHub

I’m trying to figure out what happens to see if i can explain the huge impact:

  1. The program tries to do an unaligned read/write
  2. This event gets noticed and trapped by the CPU, switches to M-mode
  3. The OpenSBI trap handler gets called (opensbi/sbi_trap.c at master · riscv-software-src/opensbi · GitHub)
  4. The trap handler determines the trap is because of misalignment, and calls the appropriate function
  5. This function “emulates” the misaligned access using aligned reads/writes and shifts
  6. CPU switches back to U-mode and program execution resumes

Is there anything i’ve missed?

I guess this already explains a lot :slight_smile:

Yes, that’s the idea