AMDGPU oops during hardware initialization

Having some interesting issues on my Unmatched board when bringing a cheap GPU online. I have a discussion going on in parallel with the AMDGPU devs here:

The interesting bit is here:

Is the PCIe bus on this platform cache coherent with the CPU? What the test does it write commands to a ring buffer in system memory and trigger the GPU to start consuming the commands. The GPU consumes the commands which tells it to write a a specific value to a register. The driver then checks the register to make sure the GPU processed the commands and updated the register. If the PCIe bus is not cache coherent, the snoop of the CPU cache from the GPU might not work and the GPU will fetch garbage if the data has not hit memory yet. I’m not too familiar with RISC-V hardware, but I know on some ARM platforms, the PCIe bus is not cache coherent if the platform vendor has not included the necessary IPs in their ARM design. RISC-V might be similar.

As far as I can tell from reading the FU740 manual, the PCIe configuration should be cache coherent, but I’m far from an expert on the matter.

1 Like