Bbl debugging using gdb


#1

Hello,

I want to debug bbl using gdb. I’ve compiled bbl with -g option.
I have a fpga board with 1G dram starting at 0x8000_0000.

I ran, riscv-gdb ./work/riscv-pk/bbl.elf

Reading symbols from ./work/riscv-pk/bbl.elf…
(gdb) target remote localhost:3333
Remote debugging using localhost:3333
bfd requires flen 8, but target has flen 0

Anyone helps me, please.


(Jim Wilson) #2

flen is the FP register size. So gdb is telling you that the bbl binary was compiled for a target with 64-bit FP (e.g. lp64d), but the hardware target has no FP registers (e.g. rv64imac). Gdb can’t properly communicate with a target if gdb thinks the FP register size is different than the target FP register size. Also, gdb thinks that the code won’t run on the target.

If your target really doesn’t have FP registers, then try compiling bbl without FP instructions, e.g. using -mabi=lp64 instead of -mabi=lp64d. If your target does have FP registers, then there is something wrong on the target side. I don’t know how this stuff works when using an fpga board, unless perhaps you are using openocd, in which case you might have the wrong openocd configuration file.

Also check your gdb version. FSF gdb 8.3 released last weekend is the first FSF gdb release with proper RISC-V support, though the github.com riscv-gnu-toolchain has had working support since about November.


#3

Thanks, Jim

Although I can’t load symbols but I’ve tested bbl using UART. And I’ve found some issues.
One of those are related to toolchain.
When I use a FreedomStudio-4.7.2.2019-03/SiFive/toolchain/riscv64-unknown-elf-gcc-8.2.0-2019.02.0/bin/riscv64-unknown-elf-gcc, I can see sifive logo through minicom. However, when I use riscv64-buildroot-linux-gnu-gcc that is a result of buildroot, I can’t see any log from minicom.

The other is write_csr(mcounteren, -1) in mstatus_init(). As the code is executed, I got the following messages.
/home/cmlee/Workspace/SiFive/freedom-u-sdk/riscv-pk/machine/mtrap.c:21: machine mode: unhandlable trap 2 @ 0x0000000080001e34
Power off.

Do you have any idea about below issues?

Regards,
cm


#4

I attach my dts file. Check it please.

/dts-v1/;

/ {
	#address-cells = <1>;
	#size-cells = <1>;
	compatible = "freechips,rocketchip-unknown-dev";
	model = "freechips,rocketchip-unknown";
	L15: aliases {
		serial0 = &L9;
	};
	L11: chosen {
	};
	L14: cpus {
		#address-cells = <1>;
		#size-cells = <0>;
		L6: cpu@0 {
			clock-frequency = <0>;
			compatible = "sifive,rocket0", "riscv";
			d-cache-block-size = <64>;
			d-cache-sets = <64>;
			d-cache-size = <4096>;
			device_type = "cpu";
			i-cache-block-size = <64>;
			i-cache-sets = <64>;
			i-cache-size = <4096>;
			next-level-cache = <&L0 &L10>;
			reg = <0x0>;
			riscv,isa = "rv64imac";
			status = "okay";
			timebase-frequency = <1000000>;
			L4: interrupt-controller {
				#interrupt-cells = <1>;
				compatible = "riscv,cpu-intc";
				interrupt-controller;
			};
		};
	};
	L10: memory@80000000 {
		device_type = "memory";
		reg = <0x80000000 0x40000000>;
	};
	L13: soc {
		#address-cells = <1>;
		#size-cells = <1>;
		compatible = "freechips,rocketchip-unknown-soc", "simple-bus";
		ranges;
		L2: clint@2000000 {
			compatible = "riscv,clint0";
			interrupts-extended = <&L4 3 &L4 7>;
			reg = <0x2000000 0x10000>;
			reg-names = "control";
		};
		L3: debug-controller@0 {
			compatible = "sifive,debug-013", "riscv,debug-013";
			interrupts-extended = <&L4 65535>;
			reg = <0x0 0x1000>;
			reg-names = "control";
		};
		L0: error-device@3000 {
			compatible = "sifive,error0";
			reg = <0x3000 0x1000>;
		};
		L1: interrupt-controller@c000000 {
			#interrupt-cells = <1>;
			compatible = "riscv,plic0";
			interrupt-controller;
			interrupts-extended = <&L4 11>;
			reg = <0xc000000 0x4000000>;
			reg-names = "control";
			riscv,max-priority = <1>;
			riscv,ndev = <1>;
		};
		L7: rom@10000 {
			compatible = "sifive,maskrom0";
			reg = <0x10000 0x2000>;
			reg-names = "mem";
		};
		L9: serial@20000000 {
			clocks = <&L8>;
			compatible = "sifive,uart0";
			interrupt-parent = <&L1>;
			interrupts = <1>;
			reg = <0x20000000 0x1000>;
			reg-names = "control";
		};
		L8: tlclk {
			#clock-cells = <0>;
			clock-frequency = <100000000>;
			clock-output-names = "tlclk";
			compatible = "fixed-clock";
		};
	};
};

#5

Hi, Jim

Due to your answer, I found out my faults, I’ve fixed them; ISA=rv64imac and ABI=lp64. After fixed it, I can use a gdb for bbl.

Thanks.


#6

Hello,

In summary, I found out why bbl didn’t work well. That was because uninitialized variables. (i.e. bss)
I think bss area should be clear but I can’t find it in riscv-pk.

Regards,
cm


(Jim Wilson) #7

I would expect that the code that loads bbl into memory would clear the bss as part of that operation. If you look at the program headers of the ELF file you will see

    LOAD off    0x0000000000008000 vaddr 0x0000000080007000 paddr 0x0000000080007000 align 2**12
         filesz 0x0000000000001076 memsz 0x000000000000b080 flags rw-

The difference between the filesz and the memsz is the bss. The loader is supposed to copy filesz bytes from the ELF file, and then zero the rest of the bytes up to memsz. You can use objdump or readelf to look at the program headers. I don’t know how booting works on an FPGA, so I don’t know the details here though.

Alternatively, if bss isn’t being cleared, then you can do it manually. There should be symbols _bss_start and _end in the program. You can add a loop at the beginning of the program to store zeroes from _bss_start to _end to clear bss. This needs to be done as early as possible. An embedded system with no boot loader would do this in _start before jumping to main, but you might be able to do it in main if _start doesn’t do much.

Since bbl is essentially a kernel, it might be clearing memory itself some other way. I’ve never looked at the details of how bbl works.


#8

Thanks.
I’ve already done as you say. I did memset(&_bss_start, 0, &_bss_end - &_bss_start) as soon as entering init_first_hart(). I’m loading bbl using gdb over openocd.
In my case, uart global variable is not initialized to zero. (0xfffffffff) So it couldn’t pass below code.

static void uart_done(const struct fdt_scan_node *node, void *extra)
{
  struct uart_scan *scan = (struct uart_scan *)extra;
  if (!scan->compat || !scan->reg || uart) return;

#9

Some questions.

  1. To boot linux, bbl should be compiled with --with-payload ?
  2. I’ve loaded dtb, vmlinux.bin to dram using openocd and then loading bbl using gdb.
    So, I’ve hardcoded some variables; kernel_start, kernel_end, dtb location in bbl.
    I’ve stuck in enter_supervisor_mode. I can’t go next step that is linux-kernel.

I have just 1 hart. Does this become a problem?

Do you have any idea?


#10

I’m here! :slight_smile:

[    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[    0.000000] Linux version 4.19.0-sifive-1+ (cmlee@cmlee) (gcc version 8.3.0 (Buildroot 2019.02-07449-g4eddd28f99)) #1 SMP Wed May 15 13:28:28 KST 2019
[    0.000000] bootconsole [early0] enabled
[    0.000000] initrd not found or empty - disabling initrd
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000bfffffff]
[    0.000000]   Normal   [mem 0x00000000c0000000-0x00000bffffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080200000-0x00000000bfffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000bfffffff]
[    0.000000] software IO TLB: mapped [mem 0xbb1fd000-0xbf1fd000] (64MB)
[    0.000000] elf_hwcap is 0x112d
[    0.000000] percpu: Embedded 17 pages/cpu @(____ptrval____) s29400 r8192 d32040 u69632
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 258055
[    0.000000] Kernel command line: earlyprintk
[    0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Sorting __ex_table...
[    0.000000] Memory: 954084K/1046528K available (6565K kernel code, 343K rwdata, 2563K rodata, 208K init, 832K bss, 92444K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
[    0.000000] plic: mapped 1 interrupts to 1 (out of 2) handlers.
[    0.000000] Kernel panic - not syncing: 4RISC-V system with no 'timebase-frequency' in DTS
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-sifive-1+ #1
[    0.000000] Call Trace:
[    0.000000] [<ffffffe000036390>] walk_stackframe+0x0/0xa0
[    0.000000] [<ffffffe00003658c>] show_stack+0x2a/0x34
[    0.000000] [<ffffffe000687c38>] dump_stack+0x62/0x7c
[    0.000000] [<ffffffe00003a828>] panic+0xd2/0x1e8
[    0.000000] [<ffffffe0000022f2>] time_init+0x28/0x64
[    0.000000] [<ffffffe00000084e>] start_kernel+0x212/0x338
[    0.000000] [<ffffffe000000058>] _sinittext+0x58/0x5c

#11

My progress is stopped due to below error.

Kernel panic - not syncing: 4RISC-V system with no 'timebase-frequency' in DTS

There is something wrong. timebase-freqeuncy exists in bbl but after kernel booting, it disappears.
I’ve printed all fdt in kernel. I can’t find “timebase-freqeuncy”.

When I tested bbl and linux on qemu, it works well. “timebase-frequency” exists.

What is the “name = name” in kernel log ?

device tree dump in linux

[    0.000000] [__of_find_property:211] name = next-level-cache
[    0.000000] [__of_find_property:211] name = reg
[    0.000000] [__of_find_property:211] name = riscv,isa
[    0.000000] [__of_find_property:211] name = status
[    0.000000] [__of_find_property:211] name = clock-frequency
[    0.000000] [__of_find_property:211] name = compatible
[    0.000000] [__of_find_property:211] name = d-cache-block-size
[    0.000000] [__of_find_property:211] name = d-cache-sets
[    0.000000] [__of_find_property:211] name = d-cache-size
[    0.000000] [__of_find_property:211] name = d-tlb-sets
[    0.000000] [__of_find_property:211] name = d-tlb-size
[    0.000000] [__of_find_property:211] name = device_type
[    0.000000] [__of_find_property:211] name = i-cache-block-size
[    0.000000] [__of_find_property:211] name = i-cache-sets
[    0.000000] [__of_find_property:211] name = i-cache-size
[    0.000000] [__of_find_property:211] name = i-tlb-sets
[    0.000000] [__of_find_property:211] name = i-tlb-size
[    0.000000] [__of_find_property:211] name = mmu-type
[    0.000000] [__of_find_property:211] name = next-level-cache
[    0.000000] [__of_find_property:211] name = reg
[    0.000000] [__of_find_property:211] name = riscv,isa
[    0.000000] plic: mapped 1 interrupts to 1 (out of 2) handlers.
[    0.000000] [time_init:28] cpu = (____ptrval____)
[    0.000000] [__of_find_property:211] name = #address-cells
[    0.000000] [__of_find_property:211] name = #size-cells
[    0.000000] [__of_find_property:211] name = name
[    0.000000] [time_init:31] ret = -22
[    0.000000] Kernel panic - not syncing: 4RISC-V system with no 'timebase-frequency' in DTS

device tree dump in bbl

  cpus {
    #address-cells = <0x00000001>;
    #size-cells = <0x00000000>;
    cpu@0 {
      clock-frequency = <0x00000000>;
      compatible = "sifive,rocket0", "riscv";
      d-cache-block-size = <0x00000040>;
      d-cache-sets = <0x00000040>;
      d-cache-size = <0x00004000>;
      d-tlb-sets = <0x00000001>;
      d-tlb-size = <0x00000020>;
      device_type = "cpu";
      i-cache-block-size = <0x00000040>;
      i-cache-sets = <0x00000040>;
      i-cache-size = <0x00004000>;
      i-tlb-sets = <0x00000001>;
      i-tlb-size = <0x00000020>;
      mmu-type = "riscv,sv39";
      next-level-cache = <0x00000001 0x00000002>;
      reg = <0x00000000>;
      riscv,isa = "rv64imafdc";
      status = "okay";
      timebase-frequency = <0x000f4240>;
      tlb-split;
      interrupt-controller {
        #interrupt-cells = <0x00000001>;
        compatible = "riscv,cpu-intc";
        interrupt-controller;
        phandle = <0x00000003>;
      }
    }
  }

(Paul Walmsley) #12

Consider rebuilding BBL with the --enable-print-device-tree flag provided to “configure”, and observing the DT printed during early boot. That might help determine what to do next.

If you’ve already done this, and are observing that the timebase-frequency property is printed by the early BBL dump, but is not found by the Linux kernel, you might consider applying the following patch to riscv-pk to see if it helps: https://github.com/sifive/riscv-pk/commit/a69cb712603339558b4d41c4cbfb03ca7de7ebd6


#13

Thanks for your reply. I’m already used --enable-print-device-tree as you pointed. :slight_smile:

I will try to apply the patch you sugguest, and let you know the result.

Thanks.
cm


#14

I’m afraid I couldn’t fix it although your patch is adopted. However, I found timebase-frequency is not disappeared. I didn’t catch it previously.

I’m wondering if the order of prop->name of node looks like out-of-order. Sometimes, same name is repeated.

[    0.000000] [__of_find_property:211] interrupt-controller->name                                                                                         [145/7192]
[    0.000000] [__of_find_property:211] cpu->clock-frequency
[    0.000000] [__of_find_property:211] cpu->compatible
[    0.000000] [__of_find_property:211] cpu->d-cache-block-size
[    0.000000] [__of_find_property:211] cpu->d-cache-sets
[    0.000000] [__of_find_property:211] cpu->d-cache-size
[    0.000000] [__of_find_property:211] cpu->d-tlb-sets
[    0.000000] [__of_find_property:211] cpu->d-tlb-size
[    0.000000] [__of_find_property:211] cpu->device_type
[    0.000000] [__of_find_property:211] cpu->i-cache-block-size
[    0.000000] [__of_find_property:211] cpu->i-cache-sets
[    0.000000] [__of_find_property:211] cpu->i-cache-size
[    0.000000] [__of_find_property:211] cpu->i-tlb-sets
[    0.000000] [__of_find_property:211] cpu->i-tlb-size
[    0.000000] [__of_find_property:211] cpu->mmu-type
[    0.000000] [__of_find_property:211] cpu->next-level-cache
[    0.000000] [__of_find_property:211] cpu->reg
[    0.000000] [__of_find_property:211] cpu->riscv,isa
[    0.000000] [__of_find_property:211] cpu->status
[    0.000000] [__of_find_property:211] cpu->timebase-frequency
[    0.000000] [__of_find_property:211] cpu->tlb-split
[    0.000000] [__of_find_property:211] cpu->name
[    0.000000] [__of_find_property:211] cpus->#address-cells
~ snip ~
[    0.000000] [__of_find_property:211] interrupt-controller->#interrupt-cells
[    0.000000] [__of_find_property:211] interrupt-controller->#interrupt-cells
[    0.000000] [__of_find_property:211] interrupt-controller->#interrupt-cells
[    0.000000] [__of_find_property:211] interrupt-controller->#interrupt-cells
~ snip ~
[    0.000000] [__of_find_property:211] cpu->riscv,isa
[    0.000000] [__of_find_property:211] cpu->status
[    0.000000] [__of_find_property:211] cpu->timebase-frequency
[    0.000000] [__of_find_property:211] cpu->tlb-split
[    0.000000] [__of_find_property:211] cpu->name
[    0.000000] [__of_find_property:211] cpus->#address-cells
[    0.000000] [__of_find_property:211] interrupt-controller->#interrupt-cells
~ snip ~
[    0.000000] [__of_find_property:211] cpu->i-cache-block-size
[    0.000000] [__of_find_property:211] cpu->i-cache-sets
[    0.000000] [__of_find_property:211] cpu->i-cache-size
[    0.000000] [__of_find_property:211] cpu->i-tlb-sets
[    0.000000] [__of_find_property:211] cpu->i-tlb-size
[    0.000000] [__of_find_property:211] cpu->mmu-type
[    0.000000] [__of_find_property:211] cpu->next-level-cache
[    0.000000] [__of_find_property:211] cpu->reg
[    0.000000] [__of_find_property:211] cpu->riscv,isa
[    0.000000] plic: mapped 1 interrupts to 1 (out of 2) handlers.
[    0.000000] [time_init:28] cpu = (____ptrval____)
[    0.000000] [__of_find_property:211] cpus->#address-cells
[    0.000000] [__of_find_property:211] cpus->#size-cells
[    0.000000] [__of_find_property:211] cpus->name
[    0.000000] [time_init:31] ret = -22
[    0.000000] Kernel panic - not syncing: 4RISC-V system with no 'timebase-frequency' in DTS
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-sifive-1+ #2

#15

time->timebase-freqeuncy is located in “cpu”, not its parent “cpus”
But I can see cpu = of_find_node_by_path("/cpus"); in arch/riscv/kerne/time.c.
Is this enough ?
Does it need a function to find child cpu such as for_each_available_child_of_node(parent, child) ?

[    0.000000]   [__of_find_property:213] cpu[18]->timebase-frequency
[    0.000000]   [__of_find_property:213] cpu[19]->tlb-split
[    0.000000]   [__of_find_property:213] cpu[20]->name
[    0.000000] [__of_find_property:221] cpu->nr_prop = 21
[    0.000000] plic: mapped 1 interrupts to 1 (out of 2) handlers.
[    0.000000] [time_init:28] cpu = (____ptrval____)
[    0.000000] [__of_find_property:211] cpus->properties = (____ptrval____)
[    0.000000]   [__of_find_property:213] cpus[0]->#address-cells
[    0.000000]   [__of_find_property:213] cpus[1]->#size-cells
[    0.000000]   [__of_find_property:213] cpus[2]->name
[    0.000000] [__of_find_property:221] cpus->nr_prop = 3
[    0.000000] [time_init:31] ret = -22
[    0.000000] Kernel panic - not syncing: 4RISC-V system with no 'timebase-frequency' in DTS
[    0.000000]

#16

I’m sorry for verbose.

I’ve fixed the issue associated with timebase-frequency by modify cpu = of_find_node_by_path("/cpus/cpu@0")

I stand here. But I don’t have any clue why booting is stopped. :frowning:

~ snip ~
[    0.010000] Console: colour dummy device 80x25
[    0.010000] console [tty0] enabled
[    0.010000] bootconsole [early0] disabled

(Jim Wilson) #17

That is perhaps just a serial console issue, and the boot did succeed, you just don’t have a working serial console. If you have an ethernet device, trying pinging it. The “Console: colour dummy device 80x25” suggests that it might be trying to use a video device as the console. If you have pci, try plugging in a video card and attaching a monitor. Otherwise, try disabling video terminal stuff in the linux defconfig file. And check CONFIG_HVC_RISCV_SBI=y, try defining it if it isn’t defined, or undefining it if it is defined, as this one is known to be an issue for some hardware/simulators.


#18

I’ve added “console=ttyS0” to cmd_line. After that, the message “bootconsole[early0] disable” retreated.
But no more progress.
I’ll try your suggestion; “CONFIG_HVC_RISCV_SBI=y”.

~ snip ~
[ 1.590000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 1.610000] 20000000.serial: ttySIF0 at MMIO 0x20000000 (irq = 1, base_baud = 0) is a sifive-serial
~ snip ~
[ 2.140000] Key type dns_resolver registered
[ 2.150000] bootconsole [early0] uses init memory and must be disabled even before the real one is ready
[ 2.160000] bootconsole [early0] disabled


#19

I’m sorry there is no effect for HVC_RISCV_SBI in my case.

I just have a uart on my fpga. There are no any devices such as video, ethern and pci etc.
Not yet enabled correctly speaking. Very minimal h/w configuration.

Could you check my dts file and kernel log for uart?

  • dts for uart
86                 L9: serial@20000000 {
 87                         clocks = <&L8>;
 88                         compatible = "sifive,uart0";
 89                         interrupt-parent = <&L1>;
 90                         interrupts = <1>;
 91                         reg = <0x20000000 0x1000>;
 92                         reg-names = "control";
 93                 };
 94                 L8: tlclk {
 95                         #clock-cells = <0>;
 96                         clock-frequency = <100000000>;
 97                         clock-output-names = "tlclk";
 98                         compatible = "fixed-clock";
 99                 };
  • kernel log
[    0.000000] Kernel command line: earlyprintk console=ttyS0 console=ttySIF0,115200n1
~ snip ~
[ 1.590000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 1.610000] 20000000.serial: ttySIF0 at MMIO 0x20000000 (irq = 1, base_baud = 0) is a sifive-serial


(Jim Wilson) #20

I’m not a kernel expert, but maybe Paul will see this and make a suggestion.