I get kernel oops like this under heavy load:
[36126.041244] Unable to handle kernel paging request at virtual address ffffffe0011cc010
[36126.048452] Oops [#1]
[36126.050697] Modules linked in: amdgpu mfd_core gpu_sched backlight snd_hda_codec_hdmi drm_ttm_helper snd_hda_intel snd_intel_dspcfg snd_hda_codec ttm snd_hda_core fuse
[36126.065642] CPU: 1 PID: 39422 Comm: … Not tainted 5.12.4 #1
[36126.071804] Hardware name: SiFive HiFive Unmatched (DT)
[36126.077017] epc : get_page_from_freelist+0x79e/0xdc4
[36126.081968] ra : get_page_from_freelist+0x71c/0xdc4
[36126.086919] epc : ffffffe00017a8b4 ra : ffffffe00017a832 sp : ffffffe0fc33bbe0
[36126.094127] gp : ffffffe001a1b770 tp : ffffffe0fc322800 t0 : ffffffffffffffff
[36126.101338] t1 : ffffffe3fec8ee40 t2 : ffffffe001000218 s0 : ffffffe0fc33bd20
[36126.108546] s1 : 0000000000000010 a0 : ffffffe0019cc320 a1 : ffffffcf057226b0
[36126.115754] a2 : ffffffe0011cc010 a3 : 0000000000000000 a4 : ffffffe0019cbf50
[36126.122964] a5 : ffffffe0019cbf40 a6 : 00000000000000d0 a7 : ffffffe001a57770
[36126.130173] s2 : 000000000000003f s3 : 0000000000000010 s4 : ffffffe3fec8ee40
[36126.137383] s5 : ffffffe0019cbf40 s6 : 0000000000000000 s7 : ffffffe3fec8ee30
[36126.144592] s8 : 0000000000000001 s9 : 00000000000000d0 s10: ffffffcf03c669e8
[36126.151801] s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000
[36126.159010] t5 : ffffffcf03c669e0 t6 : 0000000000000171
[36126.164307] status: 0000000200000100 badaddr: ffffffe0011cc010 cause: 000000000000000f
[36126.172213] Call Trace:
[36126.174644] [<ffffffe00017a8b4>] get_page_from_freelist+0x79e/0xdc4
[36126.180900] [<ffffffe00017bd74>] __alloc_pages_nodemask+0xf4/0x1a4
[36126.187066] [<ffffffe000165d2a>] __handle_mm_fault+0x3c8/0xa08
[36126.192885] [<ffffffe0001663e4>] handle_mm_fault+0x7a/0x100
[36126.198444] [<ffffffe000009bc8>] do_page_fault+0x128/0x426
[36126.203917] [<ffffffe0000039ea>] ret_from_exception+0x0/0xc
[36126.209654] ---[ end trace 292e34538dff91f5 ]---
and
[ 5331.322051] Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
[ 5331.332050] Oops [#1]
[ 5331.334291] Modules linked in: amdgpu snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec mfd_core gpu_sched backlight drm_ttm_helper snd_hda_core ttm fuse
[ 5331.349238] CPU: 1 PID: 21644 Comm: … Not tainted 5.12.4 #1
[ 5331.355401] Hardware name: SiFive HiFive Unmatched (DT)
[ 5331.360614] epc : get_page_from_freelist+0x19e/0xdc4
[ 5331.365563] ra : __alloc_pages_nodemask+0xf4/0x1a4
[ 5331.370428] epc : ffffffe00017a2b4 ra : ffffffe00017bd74 sp : ffffffe0893b3be0
[ 5331.377637] gp : ffffffe001a1b770 tp : ffffffe0893e4600 t0 : ffffffe000009aa0
[ 5331.384845] t1 : ffffffe3fec8ee40 t2 : ffffffe001000218 s0 : ffffffe0893b3d20
[ 5331.392055] s1 : 0000000000000010 a0 : 0000000000100cca a1 : 0000000000000000
[ 5331.399265] a2 : 0000000000001bf5 a3 : 0000000000000000 a4 : ffffffe3fec8ee40
[ 5331.406474] a5 : ffffffcf0367e660 a6 : 0000000000000cc0 a7 : ffffffe001a57770
[ 5331.413683] s2 : ffffffe3fec8ee20 s3 : ffffffcf0367e658 s4 : ffffffe3fec8ee30
[ 5331.420893] s5 : ffffffe0019cbf40 s6 : ffffffe0019cc8c0 s7 : ffffffe0893b3d28
[ 5331.428100] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000001
[ 5331.435310] s11: 0000000000000001 t3 : 0000003fc6df6494 t4 : 000000000000000f
[ 5331.442519] t5 : 0000000000000001 t6 : 0000000000000000
[ 5331.447816] status: 0000000200000100 badaddr: 0000000000000008 cause: 000000000000000f
[ 5331.455721] Call Trace:
[ 5331.458153] [<ffffffe00017a2b4>] get_page_from_freelist+0x19e/0xdc4
[ 5331.464409] [<ffffffe00017bd74>] __alloc_pages_nodemask+0xf4/0x1a4
[ 5331.470576] [<ffffffe000165d2a>] __handle_mm_fault+0x3c8/0xa08
[ 5331.476393] [<ffffffe0001663e4>] handle_mm_fault+0x7a/0x100
[ 5331.481955] [<ffffffe000009bc8>] do_page_fault+0x128/0x426
[ 5331.487425] [<ffffffe0000039ea>] ret_from_exception+0x0/0xc
[ 5331.493079] ---[ end trace 4e418ad0738e1f80 ]---
and
[12094.443042] BUG: Bad page state in process … pfn:1d7cec
[12094.448241] page:0000000005166fc2 refcount:0 mapcount:0 mapping:000000008b33616d index:0x1 pfn:0x1d7cec
[12094.457623] failed to read mapping contents, not a valid kernel address?
[12094.464305] flags: 0x4000000000080000(swapbacked)
[12094.469001] raw: 4000000000080000 0000000000000100 0000000000000122 0000000000650000
[12094.476728] raw: 0000000000000001 0000000000000000 00000000ffffffff
[12094.482980] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
[12094.489406] Modules linked in: amdgpu snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec mfd_core gpu_sched snd_hda_core backlight drm_ttm_helper ttm fuse
[12094.504349] CPU: 1 PID: 93058 Comm: … Not tainted 5.12.4 #1
[12094.510511] Hardware name: SiFive HiFive Unmatched (DT)
[12094.515720] Call Trace:
[12094.518153] [<ffffffe00000576a>] walk_stackframe+0x0/0xe2
[12094.523539] [<ffffffe000ac2bf4>] dump_backtrace+0x4c/0x5a
[12094.528922] [<ffffffe000ac2c26>] show_stack+0x24/0x2c
[12094.533959] [<ffffffe000ac9fa6>] dump_stack+0x7a/0x94
[12094.538997] [<ffffffe00017797e>] bad_page+0xf8/0x11e
[12094.543947] [<ffffffe00017ab04>] get_page_from_freelist+0x9ee/0xdc4
[12094.550202] [<ffffffe00017bd74>] __alloc_pages_nodemask+0xf4/0x1a4
[12094.556368] [<ffffffe000165d2a>] __handle_mm_fault+0x3c8/0xa08
[12094.562188] [<ffffffe0001663e4>] handle_mm_fault+0x7a/0x100
[12094.567746] [<ffffffe000009bc8>] do_page_fault+0x128/0x426
[12094.573219] [<ffffffe0000039ea>] ret_from_exception+0x0/0xc
[12094.578779] Disabling lock debugging due to kernel taint
[12094.584086] Unable to handle kernel paging request at virtual address ffffff8000000008
and
[12130.749844] Unable to handle kernel paging request at virtual address ffffffe0016cc0a0
[12130.757046] Oops [#1]
[12130.759292] Modules linked in: fuse
[12130.762768] CPU: 0 PID: 86990 Comm: python3 Not tainted 5.12.4 #1
[12130.768848] Hardware name: SiFive HiFive Unmatched (DT)
[12130.774061] epc : get_page_from_freelist+0x79e/0xdc4
[12130.779010] ra : get_page_from_freelist+0x71c/0xdc4
[12130.783959] epc : ffffffe00017a8b4 ra : ffffffe00017a832 sp : ffffffe099d47be0
[12130.791170] gp : ffffffe001a1b770 tp : ffffffe0fbbec600 t0 : 0000000000000001
[12130.798379] t1 : ffffffe3fec73e40 t2 : ffffffe001000218 s0 : ffffffe099d47d20
[12130.805590] s1 : 0000000000000010 a0 : 0000000000000000 a1 : ffffffcf0b0000c8
[12130.812798] a2 : ffffffe0016cc0a0 a3 : 0000000000000002 a4 : ffffffe0019cbfe0
[12130.820007] a5 : ffffffe0019cbfd0 a6 : ffffffe0019cc010 a7 : ffffffe001a57770
[12130.827216] s2 : 000000000000003f s3 : 0000000000000010 s4 : ffffffe3fec73e40
[12130.834425] s5 : ffffffe0019cbf40 s6 : 0000000000000003 s7 : ffffffe3fec73e30
[12130.841635] s8 : 0000000000000001 s9 : 00000000000000d0 s10: ffffffcf04b3a968
[12130.848843] s11: 0000000000000003 t3 : 0000000000000000 t4 : 0000000000000000
[12130.856052] t5 : ffffffcf04b3a960 t6 : ffffffffffffffff
[12130.861350] status: 0000000200000100 badaddr: ffffffe0016cc0a0 cause: 000000000000000f
[12130.869255] Call Trace:
[12130.871687] [<ffffffe00017a8b4>] get_page_from_freelist+0x79e/0xdc4
[12130.877940] [<ffffffe00017bd74>] __alloc_pages_nodemask+0xf4/0x1a4
[12130.884109] [<ffffffe000165d2a>] __handle_mm_fault+0x3c8/0xa08
[12130.889927] [<ffffffe0001663e4>] handle_mm_fault+0x7a/0x100
[12130.895485] [<ffffffe000009bc8>] do_page_fault+0x128/0x426
[12130.900958] [<ffffffe0000039ea>] ret_from_exception+0x0/0xc
[12130.906641] ---[ end trace 2f8d60d8322c02e5 ]---
and
[44732.714147] page dumped because: bad pte
[44732.718056] addr:0000003fb0099000 vm_flags:00200073 anon_vma:ffffffe09592f058 mapping:0000000000000000 index:3fb0099
[44732.728566] file:(null) fault:0x0 mmap:0x0 readpage:0x0
[44732.733779] CPU: 3 PID: 143969 Comm: b-addcon Tainted: G B D 5.12.4 #1
[44732.741408] Hardware name: SiFive HiFive Unmatched (DT)
[44732.746620] Call Trace:
[44732.749052] [<ffffffe00000576a>] walk_stackframe+0x0/0xe2
[44732.754436] [<ffffffe000ac2bf4>] dump_backtrace+0x4c/0x5a
[44732.759822] [<ffffffe000ac2c26>] show_stack+0x24/0x2c
[44732.764859] [<ffffffe000ac9fa6>] dump_stack+0x7a/0x94
[44732.769897] [<ffffffe000162c9a>] print_bad_pte+0x172/0x1b8
[44732.775370] [<ffffffe0001643dc>] unmap_page_range+0x45e/0x642
[44732.781102] [<ffffffe0001647fe>] unmap_vmas+0x92/0xda
[44732.786140] [<ffffffe00016c028>] exit_mmap+0xb0/0x1ba
[44732.791178] [<ffffffe00000c15c>] mmput+0x4c/0x10a
[44732.795868] [<ffffffe000013054>] do_exit+0x248/0x7fa
[44732.800819] [<ffffffe0000143ec>] do_group_exit+0x3e/0xce
[44732.806117] [<ffffffe00001fb0a>] get_signal+0x1a0/0x828
[44732.811328] [<ffffffe000004b90>] do_notify_resume+0x88/0x356
[44732.816974] [<ffffffe0000039ea>] ret_from_exception+0x0/0xc
I’m not sure if this is some kind of page table corruption or something else. It happens intermittently, not after any specific action.
The system is unresponsive after that. I haven’t changed the clock speed, and the board is mounted in an enclosure with proper airflow (haven’t seen temperatures >50°C in sensors
) . This happens with 5.12.4 kernel pre-built from the SDK image, and also with the previous 5.11.x kernel.
An AMD RX 550 GPU is installed, but I am not using it at the moment. I have tried without the GPU connected, same problem. No USB devices are connected. The current workload is headless software testing utilizing CPU (and disk/memory) only.
The file system is a Debian RISC-V rootfs on a Samsung SSD 970 EVO Plus 2TB.