I haven’t done any PCIe related benchmark. This change is only related to HW prefetcher settings, so I don’t expect anything about PCIe to change, at least directly. One thing that could really hurt PCIe perf is that EIC7700 doesn’t use cache coherent DMA on any peripheral. I think this issue has been discussed before.
- PCIe DMA coherent problems
- Is the Ethernet DMA coherent or not with the riscv cores on the P550 board?
In short, DMA from PCIe is not directly visible from CPU. The way Linux kernel/driver access the DMA buffer is either:
- After DMA completes, flush the cache lines containing the DMA buffer – essentially “pull” the changes done by device
- Use the uncached window to directly access DMA buffer in memory (cache bypass)
I think for your workload, you might want to see if 1 is better than 2, because with 2, the CPU is not permitted to do any caching or cache prefetch, so every read/write is literally a memory access. With 1, you pay the penalty up front, perhaps even a little bit more because for a large region, it’s inefficient to flush individual lines, and you’d just flush the whole cache. However, it’ll be much faster later on as cache/prefetcher can kick in.
You may need to dig into the Linux source in order to change the strategy (1 or 2). In general, I think this is a very noticeable short coming of the EIC770x SoC (P550/U84 core). Years ago, Starfive’s JH7100 (U74 core) suffered from the exact same issue, and later on Starfive release the 2nd Gen JH7110 (also U74 core) with cache coherent high speed peripherals PCIe/GMAC. With all these past experiences, yet we have to deal with it all over again. The problem isn’t with Sifive, but with SoC vendors not doing it properly.