Intermittent kernel oops under heavy load

There doesn’t seem to be anything wrong with the board’s RAM, running a (single-threaded) memory tester for a few days didn’t uncover a single problem.

# memtester 15G                                                                                                                                                                                                      
memtester version 4.5.0 (64-bit)                                                                                                                                                                                                    
Copyright (C) 2001-2020 Charles Cazabon.                                                                          
Licensed under the GNU General Public License version 2 (only).                                                                                                                                                                     
                                                                                                                  
pagesize is 4096                                                                                                  
pagesizemask is 0xfffffffffffff000                                                                                                                                                                                                  
want 15360MB (16106127360 bytes)                                                                                                                                                                                                    
got  15360MB (16106127360 bytes), trying mlock ...locked.                                                         
Loop 1:                                                                                                                                                                                                                             
  Stuck Address       : ok                                                                                                                                                                                                          
  Random Value        : ok                                                                                        
  Compare XOR         : ok          
  Compare SUB         : ok                                                                                        
  Compare MUL         : ok                                                                                        
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok         
  Block Sequential    : ok         
  Checkerboard        : ok         
  Bit Spread          : ok         
  Bit Flip            : setting 216

I can’t really think of anything else to try. I could try removing the NVME, but that will be kind of inconvenient (and it might stop triggering the problem for sake of I/O just being a lot slower, instead of helping narrow down the underlying issue). Or maybe downclocking the CPU.