Unexpected Bugs and Oops (Virtual memory related) with Ubuntu 21.04 and Kernel 5.11.10 shown on console

We have experienced repeated crashes with the Unmatched and Ubuntu 21.04 as follows while testing with an NVMe SSD.

Will like to know if anyone has faced these issues and what would be the recommended next step, e.g. would filing a bug? where?

Thanks!

Output of uname -a:
Linux riscv 5.11.10 #1 SMP Wed Apr 7 17:37:34 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux
per: https://blogjawn.stufftoread.com/install-ubuntu-on-hifive-unmatched.html

Message on console at time of latest crash:

[14988.302143] BUG: Bad page state in process cp pfn:27e92c
[14988.306805] page:00000000b10bfc10 refcount:0 mapcount:0 mapping:00000000bdf980c7 index:0x1 pfn:0x27e92c
[14988.316184] failed to read mapping contents, not a valid kernel address?
[14988.322867] flags: 0x4000000000000000()
[14988.326693] raw: 4000000000000000 0000000000000100 0000000000000122 0000000000610000
[14988.334420] raw: 0000000000000001 0000000000000000 00000000ffffffff
[14988.340673] page dumped because: non-NULL mapping
[14988.345363] Modules linked in: bluetooth ecdh_generic ecc fuse
[14988.351185] CPU: 1 PID: 5826 Comm: cp Not tainted 5.11.10 #1
[14988.356830] Call Trace:
[14988.359262] [] walk_stackframe+0x0/0xaa
[14988.364646] Disabling lock debugging due to kernel taint
[14988.369952] Unable to handle kernel paging request at virtual address 00000000001c03b4
[14988.377850] Oops [#1]
[14988.380106] Modules linked in: bluetooth ecdh_generic ecc fuse
[14988.385927] CPU: 1 PID: 5826 Comm: cp Tainted: G B 5.11.10 #1
[14988.392961] epc: ffffffe00013f2ce ra : ffffffe00013f534 sp : ffffffe0b8563900
[14988.400083] gp : ffffffe000e56510 tp : ffffffe082299d40 t0 : ffffffe000e6803f
[14988.407292] t1 : ffffffe000e68030 t2 : 0000000000000000 s0 : ffffffe0b8563a50
[14988.414502] s1 : 0000000000000010 a0 : 0000000000000001 a1 : 00000000001c03ac
[14988.421711] a2 : ffffffe000e25fe0 a3 : 0000000000000002 a4 : ffffffcf07113ba0
[14988.428920] a5 : ffffffe000e25f10 a6 : ffffffcf08bb0250 a7 : ffffffe000e25f50
[14988.436129] s2 : ffffffcf07113ba8 s3 : 0000000000000010 s4 : 0000000000000021
[14988.443339] s5 : 0000000000000022 s6 : ffffffe000e25e80 s7 : ffffffe3fecd1650
[14988.450547] s8 : 0000000000000001 s9 : ffffffe3fecd1640 s10: 00000000000000d0
[14988.457757] s11: 000000000000003f t3 : ffffffffffffffff t4 : ffffffffffffffff
[14988.464966] t5 : 0000000000000000 t6 : ffffffe0b85635d8
[14988.470265] status: 0000000200000100 badaddr: 00000000001c03b4 cause: 000000000000000f

Previous “Oops” and “BUGS” are the following:

Oops:

[20102.018571] Unable to handle kernel paging request at virtual address 0000000000092dd2
[20102.026467] Oops [#1]
[66085.787478] Unable to handle kernel paging request at virtual address ffffffcf0ffaaee0
[66085.794685] Oops [#1]
[67034.739580] Unable to handle kernel paging request at virtual address 0000000300000007
[67034.746778] Oops [#2]
[67034.854799] Unable to handle kernel paging request at virtual address 0000000300000007
[67034.861989] Oops [#3]
[67034.977323] Unable to handle kernel paging request at virtual address 0000000300000007
[67034.984512] Oops [#4]
[67035.099730] Unable to handle kernel paging request at virtual address 0000000300000007
[67035.106921] Oops [#5]
[67035.213562] Unable to handle kernel paging request at virtual address 0000000300000007
[67035.220748] Oops [#6]
[67035.366808] Unable to handle kernel paging request at virtual address 0000000300000007
[67035.374000] Oops [#7]
[67036.600507] Unable to handle kernel paging request at virtual address 0000000300000007
[67036.607703] Oops [#8]
[67036.955138] Unable to handle kernel paging request at virtual address 0000000300000007
[67036.962329] Oops [#9]
[ 2656.095693] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2656.103746] Oops [#1]
[ 2656.208036] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2656.216227] Oops [#2]
[ 2656.323840] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2656.331893] Oops [#3]
[ 2658.309943] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2658.317988] Oops [#4]
[ 2658.424663] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2658.432715] Oops [#5]
[ 2658.542621] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2658.550678] Oops [#6]
[ 2658.667070] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2658.675123] Oops [#7]
[ 2658.783085] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2658.791137] Oops [#8]
[ 2658.916115] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2658.924166] Oops [#9]
[ 2659.031900] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.039948] Oops [#10]
[ 2659.145840] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.153892] Oops [#11]
[ 2659.268864] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.276922] Oops [#12]
[ 2659.385518] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.393571] Oops [#13]
[ 2659.502593] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.510644] Oops [#14]
[ 2659.621318] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.629372] Oops [#15]
[ 2659.739086] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.747138] Oops [#16]
[ 2659.859561] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.867631] Oops [#17]
[ 2659.977469] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2659.985522] Oops [#18]
[ 2660.091813] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.099864] Oops [#19]
[ 2660.215257] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.223310] Oops [#20]
[ 2660.334792] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.342846] Oops [#21]
[ 2660.452556] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.460609] Oops [#22]
[ 2660.572405] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.580455] Oops [#23]
[ 2660.695250] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.703315] Oops [#24]
[ 2660.811023] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.819067] Oops [#25]
[ 2660.925159] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2660.933207] Oops [#26]
[ 2661.040717] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2661.048766] Oops [#27]
[ 2661.737476] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2661.745527] Oops [#28]
[ 2661.851742] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2661.859789] Oops [#29]
[ 2661.965820] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2661.974439] Oops [#30]
[ 2662.124203] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2662.132255] Oops [#31]
[ 2662.239623] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2662.247665] Oops [#32]
[ 2662.773313] Oops [#33]
[ 2663.541064] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2663.549114] Oops [#34]
[ 2668.276706] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 2668.284756] Oops [#35]
[ 2708.280460] Unable to handle kernel NULL pointer dereference at virtual address 000000000000011a
[ 2708.288523] Oops [#36]
[46037.395603] Unable to handle kernel paging request at virtual address ffffffcf012343a8
[46037.402795] Oops [#1]
[14988.369952] Unable to handle kernel paging request at virtual address 00000000001c03b4
[14988.377850] Oops [#1]

and Previous BUGS:

[20101.947634] BUG: Bad page state in process diff pfn:2513f4
[67034.968417] BUG: Bad rss-counter state mm:000000009b927fb1 type:MM_ANONPAGES val:1
[67035.090768] BUG: Bad rss-counter state mm:000000009b927fb1 type:MM_ANONPAGES val:1
[ 2656.437087] BUG: Bad page state in process kworker/1:2 pfn:3a85ac
[ 2656.510559] BUG: Bad page state in process kworker/1:2 pfn:1c71b8
[ 2656.585433] BUG: Bad page state in process kworker/1:2 pfn:1c71b9
[ 2656.660304] BUG: Bad page state in process kworker/1:2 pfn:3b5288
[ 2656.735175] BUG: Bad page state in process kworker/1:2 pfn:3b5289
[ 2656.810048] BUG: Bad page state in process kworker/1:2 pfn:2c2d06
[ 2656.884915] BUG: Bad page state in process kworker/1:2 pfn:2c2d07
[ 2656.959788] BUG: Bad page state in process kworker/1:2 pfn:3cd29a
[ 2657.034657] BUG: Bad page state in process kworker/1:2 pfn:3cd29b
[ 2657.109529] BUG: Bad page state in process kworker/1:2 pfn:3d0f9e
[ 2657.184402] BUG: Bad page state in process kworker/1:2 pfn:3d0f9f
[ 2657.259271] BUG: Bad page state in process kworker/1:2 pfn:2fbc70
[ 2657.334144] BUG: Bad page state in process kworker/1:2 pfn:2fbc71
[ 2657.409015] BUG: Bad page state in process kworker/1:2 pfn:36e81e
[ 2657.483886] BUG: Bad page state in process kworker/1:2 pfn:36e81f
[ 2657.558759] BUG: Bad page state in process kworker/1:2 pfn:381716
[ 2657.633628] BUG: Bad page state in process kworker/1:2 pfn:381717
[ 2657.708499] BUG: Bad page state in process kworker/1:2 pfn:366d1c
[ 2657.783375] BUG: Bad page state in process kworker/1:2 pfn:366d1d
[ 2657.858243] BUG: Bad page state in process kworker/1:2 pfn:1e5996
[ 2657.933113] BUG: Bad page state in process kworker/1:2 pfn:1e5997
[ 2658.007985] BUG: Bad page state in process kworker/1:2 pfn:39f172
[ 2658.082857] BUG: Bad page state in process kworker/1:2 pfn:39f173
[ 2658.157727] BUG: Bad page state in process kworker/1:2 pfn:3303b4
[ 2658.232599] BUG: Bad page state in process kworker/1:2 pfn:3303b5
[ 2678.092188] BUG: Bad page state in process (Xserver) pfn:1fa5ce
[ 2678.160684] BUG: Bad page state in process (Xserver) pfn:1fa5cf
[ 2678.229909] BUG: Bad page state in process (Xserver) pfn:11ab2a
[ 2678.299135] BUG: Bad page state in process (Xserver) pfn:11ab2b
[ 2678.368361] BUG: Bad page state in process (Xserver) pfn:3a6d72
[ 2678.437586] BUG: Bad page state in process (Xserver) pfn:3a6d73
[ 2678.506810] BUG: Bad page state in process (Xserver) pfn:3959af
[ 2678.576038] BUG: Bad page state in process (Xserver) pfn:36fc21
[ 2678.645270] BUG: Bad page state in process (Xserver) pfn:257c16
[ 2678.714487] BUG: Bad page state in process (Xserver) pfn:257c17
[ 2678.783714] BUG: Bad page state in process (Xserver) pfn:37228e
[ 2678.852942] BUG: Bad page state in process (Xserver) pfn:37228f
[ 2678.922168] BUG: Bad page state in process (Xserver) pfn:29c912
[ 2678.991391] BUG: Bad page state in process (Xserver) pfn:29c913
[ 2679.060617] BUG: Bad page state in process (Xserver) pfn:18689e
[ 2679.129846] BUG: Bad page state in process (Xserver) pfn:18689f
[ 2679.199069] BUG: Bad page state in process (Xserver) pfn:3f6b96
[ 2679.268293] BUG: Bad page state in process (Xserver) pfn:3f6b97
[ 2679.337518] BUG: Bad page state in process (Xserver) pfn:338de0
[ 2679.406744] BUG: Bad page state in process (Xserver) pfn:338de1
[ 2679.475974] BUG: Bad page state in process (Xserver) pfn:2e1944
[ 2679.545867] BUG: Bad page state in process sleep pfn:2e1945
[ 2679.613726] BUG: Bad page state in process sleep pfn:3f8cb6
[ 2679.682263] BUG: Bad page state in process sleep pfn:3f8cb7
[ 2679.750788] BUG: Bad page state in process sleep pfn:3be90e
[ 2679.819319] BUG: Bad page state in process sleep pfn:3be90f
[ 2679.887850] BUG: Bad page state in process sleep pfn:1cd27c
[ 2679.956380] BUG: Bad page state in process sleep pfn:1cd27d
[ 2680.024912] BUG: Bad page state in process sleep pfn:39a658
[ 2680.093443] BUG: Bad page state in process sleep pfn:39a659
[ 2680.161974] BUG: Bad page state in process sleep pfn:466416
[ 2680.230505] BUG: Bad page state in process sleep pfn:466417
[ 2680.299034] BUG: Bad page state in process sleep pfn:44dda0
[ 2680.367568] BUG: Bad page state in process sleep pfn:44dda1
[ 2680.415217] BUG: Bad page state in process sleep pfn:11ab38
[14988.302143] BUG: Bad page state in process cp pfn:27e92c

We have seen similar issues before, which tend to me some sort of a bug in the kernel. Because this is Ubuntu you should probably directly report the bug to their bug tracker. This seems to be a recent enough wiki page with details on bug reporting for Ubuntu: ReportingBugs - Community Help Wiki

I would also advice to test Ubuntu 21.10. I believe those images (final or testing) are available. See here: Ubuntu 21.10 (Impish Indri)

For Unmatched 21.10 is this one: https://cdimage.ubuntu.com/releases/21.10/release/ubuntu-21.10-preinstalled-server-riscv64+unmatched.img.xz

Thanks David! Excellent pointers to Ubuntu 21.10! Will do that.

BTW, since this was my first posting, it was hidden for the weekend and just cleared today. In the meantime, your input and others in this related topic: https://forums.sifive.com/t/intermittent-kernel-oops-under-heavy-load/5009/13 has progressed further over the weekend.

So, I will move to track all actions on this issue to that other thread since it has more information than this one (mitigations, etc.).

Thanks for your help!