How to get backtrace in rtos based on RISC-V 64?

On RISC-V, the only reliable way to backtrace is to use the EH unwind info. I don’t think that the glibc backtrace() function will work. The HP libunwind package can work, but needs a lot of stack space to work and can fail if the stack is corrupted. This works best if you run libunwind from a second process to generate the backtrace of the first process. Then if the stack of the first process is corrupted, libunwind will detect that and handle it cleanly from the second process. This code probably has a lot of linux assumptions, but if your target is POSIX you can probably make it work.
https://www.nongnu.org/libunwind/index.html

Some targets, like x86, make unwinding easy because the previous stack pointer is at the bottom of the stack frame. But RISC-V puts it at the top, so you need to know the stack frame size. But the stack frame size can vary if you have alloca calls, or variable length arrays, or compiler optimizations like shrink wrapping. There is also an issue with tail call (aka sibling call) optimization which can remove a frame from the stack. The EH unwind info knows how to handle these cases.

The technique that gdb uses is to disassemble the function prologue to find the stack frame size, and then use this info to find the return address and previous stack pointer. However, this gets tricky in the presence of compiler optimization, and can fail if you have dynamic stack allocation in the middle of a function changing the stack frame size. This works most of the time, but not all of the time. Gdb can find the function prologue because it gets that from the dwarf debug info. If you have access to the ELF file, you can also get that by searching the symbol table info, which will have function names, start addresses, and sizes.

A simpler alternative to HP libunwind might be the GCC libbacktrace library. This has no cross process support, so can only work inside the process you want to backtrace, and has some of the same issues with needing lots of stack space, and a non-corrupted stack. I haven’t tried using this on RISC-V but in theory it should work.
https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=libbacktrace/README;hb=HEAD

I once worked at a company that used function epilogues instead of function prologues. You can search forward from the current pc to find the first return instruction, and then assume that is the function epilogue. However, this fails more often than using the prologue, as a function can have more than one epilogue. This also assumes that functions are contiguous in memory. There are compiler optimizations like hot/cold basic block optimizations that can move blocks to other memory regions. And optimizations like basic block reordering to reduce the number of branches which means the first epilogue you find might not be the right one for this basic block. But anyways, finding the epilogue is easier than finding the prologue, as you don’t need DWARF debug info or the original ELF file symbol table.

1 Like