How to get backtrace in rtos based on RISC-V 64?

I am trying to dump back trace in rtos based on RISC-V. I found many exception handler example but most of them were based on Arm.
Is there any way to dump backtrace like
__builtin_return_address(int level)
or
#include <execinfo.h> backtrace()
Thus, I can save backtrace and help developers debug.

On RISC-V, the only reliable way to backtrace is to use the EH unwind info. I don’t think that the glibc backtrace() function will work. The HP libunwind package can work, but needs a lot of stack space to work and can fail if the stack is corrupted. This works best if you run libunwind from a second process to generate the backtrace of the first process. Then if the stack of the first process is corrupted, libunwind will detect that and handle it cleanly from the second process. This code probably has a lot of linux assumptions, but if your target is POSIX you can probably make it work.
https://www.nongnu.org/libunwind/index.html

Some targets, like x86, make unwinding easy because the previous stack pointer is at the bottom of the stack frame. But RISC-V puts it at the top, so you need to know the stack frame size. But the stack frame size can vary if you have alloca calls, or variable length arrays, or compiler optimizations like shrink wrapping. There is also an issue with tail call (aka sibling call) optimization which can remove a frame from the stack. The EH unwind info knows how to handle these cases.

The technique that gdb uses is to disassemble the function prologue to find the stack frame size, and then use this info to find the return address and previous stack pointer. However, this gets tricky in the presence of compiler optimization, and can fail if you have dynamic stack allocation in the middle of a function changing the stack frame size. This works most of the time, but not all of the time. Gdb can find the function prologue because it gets that from the dwarf debug info. If you have access to the ELF file, you can also get that by searching the symbol table info, which will have function names, start addresses, and sizes.

A simpler alternative to HP libunwind might be the GCC libbacktrace library. This has no cross process support, so can only work inside the process you want to backtrace, and has some of the same issues with needing lots of stack space, and a non-corrupted stack. I haven’t tried using this on RISC-V but in theory it should work.
https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=libbacktrace/README;hb=HEAD

I once worked at a company that used function epilogues instead of function prologues. You can search forward from the current pc to find the first return instruction, and then assume that is the function epilogue. However, this fails more often than using the prologue, as a function can have more than one epilogue. This also assumes that functions are contiguous in memory. There are compiler optimizations like hot/cold basic block optimizations that can move blocks to other memory regions. And optimizations like basic block reordering to reduce the number of branches which means the first epilogue you find might not be the right one for this basic block. But anyways, finding the epilogue is easier than finding the prologue, as you don’t need DWARF debug info or the original ELF file symbol table.

1 Like

Hmm. I’m not aware of RISC-V code saving the previous stack pointer at all. Code normally decrements the stack pointer at (or near) the start of the function, and increments it before returning, but I don’t recall ever seeing a function save x2.

Non-leaf functions do of course save the return address, along with any s-registers used. However the RISC-V specification documents (e.g. ABI) specify what must be saved and restored, but are to the best of my knowledge (and I’ve read them with exactly this question in mind) completely silent as to the layout of saved registers within the stack frame. You might know what gcc or llvm do at the moment, but it is completely arbitrary and can change at any time.

I don’t think there’s any way to reliably do a backtrace on RISC-V without access to metadata. Or code generation conventions that are not documented at present (if they exist).

I meant the previous frame pointer, which is stored at a variable offset from the current frame pointer. Hence the need for metadata, or disassembling code to find it.

The frame pointer frequently gets optimized away on RISC-V. Some other architectures preserve it more often than RISC-V. Some applications use -fno-omit-frame-pointer to preserve it to make generating backtraces easier. But that doesn’t help as much on RISC-V because of the normal frame layout makes it hard to find the previous frame pointer.