Behaviour of instruction cache next-line prefetcher

Table 20: SiFive Feature Disable CSR lists

17 Disable instruction cache next-line prefetcher

There is the warning: A particular Feature Disable CSR bit is only to be used in a very limited number of situations, as detailed in the Example Usage entry in Table 21.

However, is there now more information on feature 17?
How and when is it triggered currently [if enabled]?
Is the cacheline scanned [even partially as the last instructions loaded by fetch] to determine if fall through is possible?

In part I ask because, I am looking at limiting user code execution at certain 64 byte aligned chunks to effectively double the ITIM faeture. next-line prefetch could potentially muck that up.

The idea is that for each ITIM cache-line the corresponding code-lines are avoided in the application code , except the cache-line that is intended to be resident in the cache.
For some uses/applications this means that user code may not use those address ranges as that code is reserved for machine mode [e.g. trap handler].

Has someone already implemented this?
Have they promoted to tool-chain?
Does RISCV gcc or LVVM have a memory map configuration input that ensures such cache lines [but in general many small non-executable memory locations, like MMIO] are worked around by the code. [like word processor text flow around images.]

Looking forward to insights.

Thanks.

This doesn’t seem to make any sense. Perhaps you have some misconception?

Code executed from ITIM is executed directly from ITIM, it is not loaded into the instruction cache, and is not subject to prefetch (totally unnecessary, as it is already in ITIM)

This also makes no sense.

MMIO regions are in an utterly different part of the memory map to anywhere you might store code (Flash, ITIM, DTIM)

There are 4294967296 memory addresses available, 4194304 of which are used for flash, 16384 are used for DTIM, and 16384 for the ICACHE/ITIM, and a few hundred bytes scattered in the first 256 MB for MMIO of various types.

See the memory map on p22 of the manual:

| bruce Bruce Hoult
December 11 |

  • | - |

This doesn’t seem to make any sense. Perhaps you have some misconception?

Code executed from ITIM is executed directly from ITIM, it is not loaded into the instruction cache, and is not subject to prefetch (totally unnecessary, as it is already in ITIM)

Yes. I understand how ITIM works.
What I am considering is the associated cache line that is filled from OTP or QSPI rom. These are read into the cache that is not reserved for ITIM.

When code is executing from OTP or QSPI or from DTIM for that matter, the table bit 17 enables the disabled prefetch for these siurces and that is what I am asking about. What are the specifics of this behaviour when enabled?

Thanks for responding. Note, I said “like” as “in a manner similar to”.

[somehow the rest of the email response I sent did not show up here.] So here it is:

MMIO regions are in an utterly different part of the memory map to anywhere you might store code (Flash, ITIM, DTIM)

Not always true. Consider a custom device that provides both QSPI flash and random number generation or crypto key using the QSPI handshake.

For the FE310 this would look like and exist in the QSPI memory map, but have carved out parts of its address space for these MMIO features.

Reduced chip count is a compelling rationale for such MMIO/Flash mixing.

But this is not my concern or my question at all.

I was asking if gcc or LLVM had such a generalized mechanism. My application is not MMIO at all, but optimizing ITIM use.