PCIe DMA coherent problems

Hi all, I am trying to use the tenstorrent Grayskull AI accelerator through the PCIe connection (even if it is extremely slow). With the help of TT people their code to use the dma_alloc_coherent with a 1GB size as a buffer(by the fault they use a Pinned HugePage), but I am not sure it is working properly (also the BAR memory region shouldn’t be a problem but I am in doubt at this point hahhaha). Another solution would be flushing the cache after each write, but I am not sure how to initialize the ccache or if the ccache_flush64_range would work at all. Any help is more than welcome ^^

dma_alloc_coherent() will return uncached memory, so it is not necessary to perform any cache maintenance when using memory allocated by this function. Can you verify that the memory allocation actually succeeds? dma_alloc_coherent() requires the entire memory region to have contiguous CPU and I/O virtual addresses; since PCIe devices are not behind an IOMMU, this means that you need a CMA reservation large enough to provide a 1 GiB contiguous physical memory buffer.