Generating core dumps on bare metal RISC-V


(Robin Kuzmin) #1

What means are currently available to generate and analyze the core dumps of bare metal RISC-V firmware?

(Sorry, as a newbie I can only use 2 web-links in my first post(s))
I work with my-employer-specific fork of minimalistic RISC-V CPU PicoRV32 (github . com/cliffordwolf/picorv32). The Verilog-based bit image of the CPU is flashed to the FPGA connected over JTAG+USB to the personal computer, where I use (again, my-employer-specific fork of) OpenOCD. I use the RISC-V GNU tool-chain (riscv32-unknown-elf-gcc, riscv32-unknown-elf-gdb) to compile and debug the bare metal RISC-V firmware. Works good.
I need to be able to generate and analyze the core dumps of the bare metal RISC-V firmware (see man 5 core or man7 . org/linux/man-pages/man5/core.5.html). E.g. in riscv32-unknown-elf-gdb command prompt I want to enter gcore

(gdb) gcore

and to get the core dump file. Then to analyze that core dump with riscv32-unknown-elf-gdb (search for “core” here - sourceware . org/gdb/current/onlinedocs/gdb/Files.html#index-core-dump-file).
When I do (gdb) gcore, the riscv32-unknown-elf-gdb tells me Can't create a corefile, and most importantly it does not even try, i.e. it does not send any requests to the gdb server (in OpenOCD). This makes me think that riscv32-unknown-elf-gdb cannot generate and cannot analyze the (bare metal) core files. Is such my suspicion correct?

If the riscv32-unknown-elf-gdb still can analyze the (bare metal) core dump files then what means are there available to generate the core files. Are there any C/C++ libraries? Or will I have to implement the generation by myself (e.g. searching for “core” in man 5 elf - man7 . org/linux/man-pages/man5/elf.5.html)?

I also plan to generate the core dump files when the RISC-V firmware does something wrong, e.g. accesses the NULL pointer. My code in OpenOCD will notice that and generate the core dump file.


(Jim Wilson) #2

If you look at the docs for the gcore command in the gdb manual, you will see
Note that this command is implemented only for some systems (as of
this writing, GNU/Linux, FreeBSD, Solaris, and S390).

The command does work for riscv*-linux, but the support is primarily in the gdb/linux-tdep.c file with just a small hook riscv_linux_iterate_over_regset_sections in the riscv-linux-tdep.c file.

Core files forrmats are OS specific, so if there is no target OS, there is no core file format for it. You can probably define your own, or maybe copy the linux core file format, which is actually defined by the linux kernel, and gdb just follows the linux kernel. It might be easier to do this outside gdb though.

There is unfortunately no dedicated riscv gdb maintainer, and I just do riscv linux support as time permits, which is not very often. And another guy does riscv elf support as time permits, which is not very often. So it is unlikely anyone will implement something for you.


(Robin Kuzmin) #3

Thank you for your reply, Jim.
So far I have the following understanding.

The GNU tool-chain that I have consists of 2 parts:

  1. Tool-chain for bare metal RISC-V firmware/software (riscv32-unknown-elf-gcc / riscv32-unknown-elf-gdb).
  2. Tool-chain for firmware/software for Linux on RISC-V (riscv32-unknown-linux-gnu-gcc / riscv32-unknown-linux-gnu-gdb).

I’m working with item 1. Item 1 tool-chain cannot generate the core dump files and cannot analyze the core dump files (no support of the core dump analysis in riscv32-unknown-elf-gdb). Even if I manage to generate in some way the core dump file for a bare metal RISC-V firmware, I still will not be able to analyse such a core dump file with riscv32-unknown-elf-gdb until someone (or myself) adds a core dump analysis support to riscv32-unknown-elf-gdb.

Correct?


(Jim Wilson) #4

Correct. core dump files are a unix feature.


(Robin Kuzmin) #5

Jim,
I’m trying to generate the core dump for bare metal RISC-V firmware, and then to try to analyze the core dump with the Linux tool-chain (riscv32-unknown-linux-gnu-gdb). I.e. I’m trying to generate the core dump as if it was generated by Linux.
In the tool-chain’s file gdb/riscv-linux-tdep.c I see the fragment

/* Define the general register mapping.  The kernel puts the PC at offset 0,
   gdb puts it at offset 32.  Register x0 is always 0 and can be ignored.
   Registers x1 to x31 are in the same place.  */

static const struct regcache_map_entry riscv_linux_gregmap[] =
{
  { 1,  RISCV_PC_REGNUM, 0 },
  { 31, RISCV_RA_REGNUM, 0 }, /* x1 to x31 */
  { 0 }
};

This fragment makes me think that the GDB expects in the “.reg” section of the core dump the values of general purpose registers such that the value of register X0 (Zero) is not present (in the “.reg” section), but instead of X0 the value of the Program Counter is expected (in the “.reg” section).
Then the GDB expects the values of registers X1 - X31 in the “.reg” section of the core dump.

Is this guess correct?


(Jim Wilson) #6

Yes, that is correct, pc followed by x1 to x31 in the .reg section of the core dump file.


(Robin Kuzmin) #7

Jim,
I have an impression that the core dump analysis does not work in GDB for 32-bit platform (riscv32-unknown-linux-gnu-gdb).

Explanation.
In riscv-gnu-toolchain/riscv-gdb/bfd/elfnn-riscv.c I step through the fragment:

3997        /* Support for core dump NOTE sections.  */
3998 
3999        static bfd_boolean
4000        riscv_elf_grok_prstatus (bfd *abfd, Elf_Internal_Note *note)
4001        {
4002          switch (note->descsz)
4003            {
4004              default:
4005                return FALSE;
4006 
4007              case PRSTATUS_SIZE:  /* sizeof(struct elf_prstatus) on Linux/RISC-V.  */
4008                /* pr_cursig */
4009                elf_tdata (abfd)->core->signal
4010                  = bfd_get_16 (abfd, note->descdata + PRSTATUS_OFFSET_PR_CURSIG);

The lines 4002 (switch) and 4007 (case) force the core dump file to have the following set of values in a Note (see Elf32_Nhdr):

  • n_type: NT_PRSTATUS;
  • n_descsz: PRSTATUS_SIZE

(in order to get to line 4007).

However immediately prior to this code fragment I see the following:

3975        #if ARCH_SIZE == 32                                                                                                      
3976        # define PRSTATUS_SIZE                  0 /* FIXME */                                                                    
            . . .
3985        #else                                                                                                                    
3986        # define PRSTATUS_SIZE                  376                                                                              
            . . .
3995        #endif                                                                                                                   

The line 3976 sets the PRSTATUS_SIZE to 0 (for 32-bit platform only).
As a result of lines 4002, 4007, and 3976 the field n_descsz in a Note (Elf32_Nhdr) has to be 0 (despite the fact that there are data following the Note).

Then in the following fragment (in riscv-gnu-toolchain/riscv-gdb/bfd/elf.c):

11701             p += ELF_NOTE_NEXT_OFFSET (in.namesz, in.descsz, align);

the GDB tries to step to the next Note. However the in.descsz is 0 (and has to be, as explained above) and the GDB does not reach the next Note, it steps through the data associated with the current Note (and interprets those data as the Notes, which is wrong).

My impression is that the
Core Dump Analysis has not been tested and does not work on 32-bit platform (in riscv32-unknown-linux-gnu-gdb).

What am I doing wrong?


I can send you the (manually written) core file, the corresponding ELF file, the explanation of every byte in the core file, and the stack trace of how I bumped into this.


(Jim Wilson) #8

We only have hardware and desktop linux distros for 64-bit linux. OpenEmbedded on qemu can support 32-bit linux. I did check core file support once in response to a bug report, and there was a binutils patch required to make it work. Otherwise, I have done no 32-bit linux gdb testing.

bfd/ChangeLog
2019-04-22 Jim Wilson jimw@sifive.com

    * elfnn-riscv.c (PRSTATUS_SIZE) [ARCH_SIZE==32]: Change from 0 to 204.

It sounds like you are missing this patch.


(Robin Kuzmin) #9

Jim,
don’t you know by any chance how the stack segment data are represented in Linux core files?

  • Are they represented by a Note of a specific n_type (search for “n_type” here)
  • or are they represented by a Program Header with p_type == PT_GNU_STACK and handled by the following code in riscv-gnu-toolchain/riscv-gdb/bfd/elf.c (function bfd_section_from_phdr()):
// riscv-gnu-toolchain/riscv-gdb/bfd/elf.c:
3039 case PT_GNU_STACK:
3040   return _bfd_elf_make_section_from_phdr (abfd, hdr, hdr_index, "stack");

?


(Jim Wilson) #10

I only looked at the .reg and .reg2 sections as that was the only RISC-V specific part of the core files. But just generating a core file and taking a quick look I see that the stack is just a regular load section in the core dump file. Gdb on the core file says
(gdb) print $sp
$1 = (void *) 0x3fff99f1e0
and objdump -x on the core file has
25 load14 00021000 0000003fff97f000 0000000000000000 00016000 2**12
CONTENTS, ALLOC, LOAD
There are 14 of these sections, which map to various segments in the program and loaded shared libraries, one of which is the program stack.

PT_GNU_STACK is a special segment that indicates whether the stack is executable or not. On old systems it is usually executable, but on new systems it is not unless you have nested function trampolines that require an executable stack.