U-Boot says Unhandled exception: Illegal instruction

I had successfully installed Ubuntu 21 to the SD card and the NVME SSD (Samsung 970 EVO) yesterday, and it was working fine. Now today this is emitted during boot:

U-Boot SPL 2021.01+dfsg-3ubuntu9 (Apr 21 2021 - 17:05:00 +0000)
Trying to boot from MMC1


U-Boot 2021.01+dfsg-3ubuntu9 (Apr 21 2021 - 17:05:00 +0000)

CPU:   rv64imafdc
Model: SiFive HiFive Unmatched A00
DRAM:  16 GiB
MMC:   spi@10050000:mmc@0: 0
EEPROM: SiFive PCB EEPROM format v1
Serial number: SF105SZ212200723
PCB revision: 3
Ethernet MAC address: 70:b3:d5:92:f9:87
CRC: 553e04e9
EEPROM dump: (0x25 bytes)
00: F1 5E 50 45 01 02 00 03 42 00 53 46 31 30 35 53 
10: 5A 32 31 32 32 30 30 37 32 33 01 70 B3 D5 92 F9 
20: 87 E9 04 3E 55 
found SiFive v1
In:    serial@10010000
Out:   serial@10010000
Err:   serial@10010000
Model: SiFive HiFive Unmatched A00
Net:   eth0: ethernet@10090000
Hit any key to stop autoboot:  0 
PCIe Link up, Gen1

Device 0: Vendor: 0x144d Rev: 2BUnhandled exception: Illegal instruction
EPC: 0000000000000000 RA: 0000000000000000 TVAL: 0000000000000000
EPC: ffffffff8029d000 RA: ffffffff8029d000 reloc adjusted


resetting ...
System reset not supported on this platform
### ERROR ### Please RESET the board ###

I figured some corruption may have occurred when I forced the machine to power off last night. So I re-imaged the SD card, and then the SSD with a USB enclosure but it still happens when the SSD is plugged in.

your ssd may be incompatible with the board. sifive suggest samsung 970plus。in fact , hp ex900 250G also can work.
change your ssd to an older version and try again.

Does it boot from just the SD card without the NVME present?

I had an SD card completely fail on me during my initial testing with Ubuntu… I’m unsure if the board, or even the OS, did something to it or if I just had a bad card, because it hasn’t happened again with a new one… But it might be worth checking, as it might be an indicator of something worse going on.

Edit: Thinking about it, it actually happened shortly after I switched to using an NVME drive.

I also had the first (brand new, Sandisk Extreme 32 GB) SD card I tried fail. I believe I hadn’t even put it into the Unmatched (so it’s not its fault). I wrote Ubuntu onto it and it wouldn’t verify. I couldn’t do anything to revive it and threw it away. I had spare cards on hand and the next one worked fine.

@spaceotter Could you try running diagnostics (for example, smartctl) on the SSD card from another system?

If you have a USB to SSD drive you could also try to run diagnostics on the Unmatched board after inserting the USB to SSD post (micro SD card) Linux boot.

It would be helpful to isolate this as a HW or SW issue.

If you have an image on NVME, then the bootloader will try to boot from NVME, but this does’t always work. There is a default order for booting which puts NVME before SDcard. Your SDcard is probably still OK. And your nvme drive is probably OK too. You can interrupt booting at the prompt "Hit any key to stop autoboot: 0 " before the count hits zero. Then at the u-boot prompt you can run “run bootcmd_mmc0” to force a boot from the SD card. This is documented in the SW Reference manual on the sifive web site. You can find docs at the bottom of this link.
https://www.sifive.com/boards/hifive-unmatched

You can set up to boot from sd card and use nvme root which is more stable than booting directly from nvme. If you want to keep your nvme image, but stop booting from it, then I think all you need to do is remove or rename the extlinux.conf file on the fourth partition, as I think that is what the boot loader looks for.

I have an image on my nvme card and sometimes it boots fine and sometimes it doesn’t. Usually if I powercycle it will eventually succeed in booting directly from nvme. I don’t think it is known what the problem is. If you have USB devices attached (other than keyboard/mouse), you might try removing them to see if that makes the system boot better from nvme. In general, I’ve found that a usb thumb drive seems to cause trouble when attached. It may also make a difference which USB port devices are plugged into. Looking at the back of the board, the port on the upper left is directly connected to the pcie switch, the other three are not. So that port on the upper left tends to work better than the other three.

1 Like

The device boots from the SD card with the NVME drive removed.
Using run bootcmd_mmc0 seems to work as well. There are no attached USB devices.

Smartctl says:

Summary
smartctl 7.2 2020-12-30 r5155 [riscv64-linux-5.11.0-1012-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO 1TB
Serial Number:                      S5H9NS0NA53020K
Firmware Version:                   2B2QEXE7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            50,547,662,848 [50.5 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5a0141f2dd
Local Time is:                      Wed Jun 30 18:05:43 2021 UTC
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.20W       -        -    0  0  0  0        0       0
 1 +     4.30W       -        -    1  1  1  1        0       0
 2 +     2.10W       -        -    2  2  2  2        0       0
 3 -   0.0400W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    791 [404 MB]
Data Units Written:                 99,197 [50.7 GB]
Host Read Commands:                 29,160
Host Write Commands:                37,991
Controller Busy Time:               0
Power Cycles:                       69
Power On Hours:                     17
Unsafe Shutdowns:                   61
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               44 Celsius
Temperature Sensor 2:               54 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

The NVME mysteriously started working again after I used run bootcmd_mmc0 to boot from the SD card and checked it with smartctl. If it happens again I can report whether this is repeatable.

I too got this error, but it wasn’t because of the SSD. I had originally thought it was due to an overclock, but I booted an older kernel version, and everything is back to normal.

I’m currently running the Ubuntu server image off SSD. I’m back to 5.11.0-1012-generic

This may or may not be related to the original issue, but i’m now “stuck” on 1013 and 1014. Which i’m unsure where the debs are for me to roll back to 1012.

With 1014 I get the following,

Starting kernel ...

[    0.000000] Linux version 5.11.0-1014-generic (buildd@riscv64-qemu-lcy01-084) (gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #14-Ubuntu SMP Wed Jun 30 17:56:50 UTC 2021 (Ubuntu 5.11.0-1014.14-generic 5.11.22)
[    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[    0.000000] earlycon: sifive0 at MMIO 0x0000000010010000 (options '')
[    0.000000] printk: bootconsole [sifive0] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] Initial ramdisk at: 0x(____ptrval____) (183422976 bytes)
[    0.000000] cma: Reserved 32 MiB at 0x00000000fe000000
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000047fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080200000-0x000000047fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000047fffffff]
[    0.000000]   DMA32 zone: 512 pages in unavailable ranges
[    0.000000] SBI specification v0.2 detected
[    0.000000] SBI implementation ID=0x1 Version=0x9
[    0.000000] SBI v0.2 TIME extension detected
[    0.000000] SBI v0.2 IPI extension detected
[    0.000000] SBI v0.2 RFENCE extension detected
[    0.000000] software IO TLB: mapped [mem 0x00000000fa000000-0x00000000fe000000] (64MB)
[    0.000000] SBI v0.2 HSM extension detected
[    0.000000] CPU with hartid=0 is not available
[    0.000000] CPU with hartid=0 is not available
[    0.000000] riscv: ISA extensions acdfim
[    0.000000] riscv: ELF capabilities acdfim
[    0.000000] percpu: Embedded 26 pages/cpu s69272 r8192 d29032 u106496
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 4128264
[    0.000000] Kernel command line: root=/dev/nvme0n1p1 ro earlycon
[    0.000000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
[    0.000000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.000000] Memory: 16165452K/16775168K available (9854K kernel code, 5763K rwdata, 8192K rodata, 2519K init, 997K bss, 576948K reserved, 32768K cma-reserved)
[    0.000000] random: get_random_u64 called from kmem_cache_open+0x36/0x338 with crng_init=0
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] ftrace: allocating 38893 entries in 152 pages
[    0.000000] Oops - illegal instruction [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-1014-generic #14-Ubuntu
[    0.000000] epc: ffffffe00000920e ra : ffffffe000009384 sp : ffffffe001803d30
[    0.000000]  gp : ffffffe001a14240 tp : ffffffe00180f440 t0 : ffffffe07fe38000
[    0.000000]  t1 : ffffffe0019cd338 t2 : 0000000000000000 s0 : ffffffe001803d70
[    0.000000]  s1 : 0000000000000000 a0 : ffffffe0000095aa a1 : 0000000000000001
[    0.000000]  a2 : 0000000000000002 a3 : 0000000000000000 a4 : 0000000000000000
[    0.000000]  a5 : 0000000000000000 a6 : 0000000000000004 a7 : 0000000052464e43
[    0.000000]  s2 : 0000000000000002 s3 : 0000000000000001 s4 : 0000000000000000
[    0.000000]  s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
[    0.000000]  s8 : ffffffe001a170c0 s9 : 0000000000000001 s10: 0000000000000001
[    0.000000]  s11: 00000000fffcc5d0 t3 : 0000000000000068 t4 : 000000000000000b
[    0.000000]  t5 : ffffffe0019cd3e0 t6 : ffffffe001803cd8
[    0.000000] status: 0000000200000100 badaddr: 000000000513f187 cause: 0000000000000002
[    0.000000] ---[ end trace f67eb9af4d8d492b ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

I’m not sure where to report this issue to either, who the maintainer is of this kernel etc.

You can get linux-image-5.11.0-1012-generic here:

http://ports.ubuntu.com/ubuntu-ports/pool/main/l/linux-riscv/

1 Like

I’ve lodged a ticket at the following link

My findings are published here

https://people.canonical.com/~xnox/lp1934548/

Essentially some kernel builds hit Illegal instruction during ftrace_init() as observed during a bad boot with a few debug statements added.

Yet if one adds a bit more debug print statements it boots fine as seen in the good boot.

There is no kexec/kdump facility available, and I don’t know how to gdb the kernel with jtag. I have built the good & bad kernels with debug symbols, and attached the reference base image. Swapping in vmlinuz-5.11* alone from bad/good boots triggers failed/successful boots.

At this point I don’t know if there is a CPU bug / errata that is being reproduced, or if there is some toolchain bug of not aligning something. I’m also not sure how to debug and investigate this issue further.

Please see the boot logs attached there, and how minimal the diff between good & bad build is.

It does kind of means that arbitrary kernels builds may or may not be bootable on Unmatched in v5.11 series at least.

@davidlt any ideas as to what the above could be?

This is fixed upstream in v5.12 kernels and later

Fixed in Ubuntu v5.11 kernels

Also submitted the fix to v5.10.y stable tree

1 Like