Hi all, I am trying to use the tenstorrent Grayskull AI accelerator through the PCIe connection (even if it is extremely slow). With the help of TT people their code to use the dma_alloc_coherent with a 1GB size as a buffer(by the fault they use a Pinned HugePage), but I am not sure it is working properly (also the BAR memory region shouldn’t be a problem but I am in doubt at this point hahhaha). Another solution would be flushing the cache after each write, but I am not sure how to initialize the ccache or if the ccache_flush64_range would work at all. Any help is more than welcome ^^
dma_alloc_coherent() will return uncached memory, so it is not necessary to perform any cache maintenance when using memory allocated by this function. Can you verify that the memory allocation actually succeeds? dma_alloc_coherent() requires the entire memory region to have contiguous CPU and I/O virtual addresses; since PCIe devices are not behind an IOMMU, this means that you need a CMA reservation large enough to provide a 1 GiB contiguous physical memory buffer.
Does SoC EIC7700X use Front Port of P550 cluster for DMA-coherency?
I think we can pretty much confirm it’s not using the front port. Per the description on the v2 patchset that add initial device-tree of EIC7700 to upstream linux:
Hi Pinkesh,
Thank your for the patches!
Should this not be marked dma-noncoherent to avoid having to mark each
peripheral as such?Thanks for your feedback.
We have not added “dma-noncoherent” because there are no DMA-capable peripherals in the devicetree yet.
We planned to add this later when we add any DMA capable devices
i.e. sdhci, gmac, sata, pcie, spi.Do you recommend to add this property in current version?
IIRC, all those DMA capable peripherals are dma-noncoherent.