Flush/Invalidate L1/L2 on the U54-MC

Hi everyone,

I’m currently working on a U54-MC architecture (PolarFire SoC) and I’m trying to measure the impact (in terms of clock cycles) of a cache invalidate and/or flush for the L1/L2 caches.

From what I understand, for the L1 cache :

  • iCache invalidate : Use FENCE.I instruction (what about CFLUSH.I.L1, what is the difference?)
  • dCache invalidate/flush : Use CFLUSH.D.L1/CFLUSH.D.L1 instructions

For L2-Cache, apparently the only way to flush the L2 is done by using the Flush32/Flush64 registers of the L2 controller and writting “virtual addresses”.
Is there no way to flush the L2-Cache using set/way?
How can I flush the entire L2 Cache with these registers?
Does it mean I have to write all possible addresses present in cache to be sure the cache is 100% flushed?
(Like, if I have 2GB of DDR and each cache block is 64 bytes, I need to loop through 2048x1024x1024/64 addresses?)
Am I missing something?

Let me know if anything I’ve written is incorrect.

Some corrections :

  • iCache invalidate : Use CFLUSH.I.L1 instruction (I think I get the difference with FENCE.I)
  • dCache invalidate/flush : Use CDISCARD.D.L1/CFLUSH.D.L1 instructions

However I’ve tried using any these instructions doing the following (same as here) :

__asm__ __volatile__(".word 0xfc000073" : : : “memory”);

And apparently I get an “Invalid instruction”… :slight_smile:

Flushing L2 should be very rare as it is coherent between CPU cores and between I/D. If you have IO devices then their pages can be marked as non-cacheable.

Things such as DMA engines should follow the TileLink coherence protocols.

Other devices that modify or read large amounts of memory in an incoherent way are I think not well catered for in a standardized way at the moment. There is a Working Group at the moment defining standard fine grained cache management instructions similar to PowerPC or ARM, but it will be a year or two before silicon is available implementing them.

1 Like

Hi Bruce,

Thank you for your answer.
The idea behind flushing entire L1/L2 caches is more about doing some benchmarking, while also checking the functionnality of these features (flush/invalidate) on the PolarFire SoC. But I understand your comment.

The way I understand it, there is some ongoing work (I guess you are refering to the CMO TG) to officially add invalidate/flush instructions in the RISC-V specs (Instead of custom instructions like the one I’ve mentionned)? Do you know if it only concerns CMO for L1 cache or also L2 cache?

By the way, do you have any idea why I can’t get the instructions CFLUSH.I.L1/CFLUSH.D.L1 to work?
Is this even implemented on the U54-MC present in the PolarFire SoC (or even the HiFive Unleashed)? I’m starting to have doubts about it…

I’m afraid I haven’t myself researched or tried those instructions. I don’t have my Icicle yet … I finally got notification it shipped from Texas two days ago. So maybe I’ll have it in a week.

Yes, I’m talking about the CMO TG. I don’t get to many of the “meetings” as they happen at something like 4 AM my time, but I monitor the mailing list and comment from time to time.

There is certainly much discussion in the TG about dealing with different levels of cache, L1, L2, Ln … and talk of things such as “point of convergence”. I don’t know what the end result will be, but these things are being considered.

I believe operations depending on the logical design of the cache – per set or way operations – will NOT be included in the current extension. But a way to flush the entire cache – without iterating through the whole address space 64 (or whatever) bytes at a time – will be.

1 Like

An update for those interested :
So, I’ve found some information on how to flush the L2 cache by way : Link
Very interesting.

Also, does anyone have ever tried using CFLUSH.I.L1/CFLUSH.D.L1/CDISCARD.D.L1 on an HiFive Unleashed?
Either by doing the following :
__asm__ __volatile__(".word 0xfc000073" : : : “memory”); //CFLUSH.D.L1
__asm__ __volatile__(".word 0xfc100073" : : : “memory”); //CFLUSH.I.L1
__asm__ __volatile__(".word 0xfc200073" : : : “memory”); //CDISCARD.I.L1
Or calling the corresponding freedom metal library functions :
metal_dcache_l1_flush(metal_cpu_get_current_hartid(), (uintptr_t)NULL);
metal_dcache_l1_discard(metal_cpu_get_current_hartid(), (uintptr_t)NULL);
Or just running this example.

On my platform (PolarFire SoC), any of these will cause an invalid instruction exeption.
(I don’t have access to an Unleashed board right now, I can’t test it myself, so… if anyone has tested it or can test it, don’t hesitate to give a feedback :slight_smile: )

It seems that first link is only for the big boys with actual support contracts.

Hi Antonin,

The only cache operations supported on the PolarFire SoC and FU540 SoC (on HiFive Unleashed) are the L2 Cache Flush operations (through the Flush32/Flush64 registers) and FENCE.I. Flushing a line in the L2 will also back probe into the L1 caches and flush them if required.

The CFLUSH.D.L1 and CDISCARD.D.L1 custom instructions were added after the designs for those SoCs were completed. CFLUSH.I.L1 isn’t supported in a released product yet, but it accidentally appeared in a few manuals as you found.

1 Like

Actually, you just need to create an account. No need for any support contract.
But I’ll copy/paste the article here :

How to Flush the L2 Cache by Way?

This article describes how to flush the L2 Cache by using the zero device. Alternatively, there is a flush by address function in the L2 Controller space described in the memory map of the core manual. Additionally, this functionality is described in “Cache Flush Registers” section.

A user may want to use the Zero Device if they would like to flush the entire cache. This is a faster way than flushing by the address.

Instructions

To flush a single index+way:

  1. Write WayMask register to allow evictions from only the specified way.
  2. Issue a load (or store) to an address in the L2 zero-device region that corresponds to the specified index.

To flush the entire L2:

  1. Write WayMask register to allow evictions from only way 0.
  2. Issue a series of loads (or stores) to addresses in the L2 zero-device region that correspond to each L2 index. (i.e. one load/store per 64B, total of (way-size-in-bytes/64) loads or stores)
  3. Write WayMask register to allow evictions from only way 1.
  4. Repeat step 2
  5. Repeat steps 3 and 4, moving through each way of the cache, until all ways have been flushed.

To flush a range of physical addresses much larger than a cache-way:

  1. Flush the whole cache as shown above.

To flush a range of physical addresses not much larger than a cache-way:

  1. Use the existing flush-by-address mechanism, iterating over the addresses, or write WayMask register to allow evictions from only way 0.
  2. Issue a series of loads (or stores) to addresses in the L2 zero-device region that correspond to the L2 index associated with each 64B chunk within the specified address range. (i.e. one load/store per 64B, total of (specified-address-range-in-bytes/64) load or stores)).
  3. Write WayMask register to allow evictions from only way 1.
  4. Repeat step 2.
  5. Repeat steps 3 and 4, moving through each way of the cache, until all ways have been flushed
    (this should all be done with no intervening stores that could create new dirty lines of course).

Thank you Ralph for letting me know!

Isn’t that kind of a problem to have different versions/implementations of the U54-MC out there but only one datasheet?

Like, there is no information to link a specific hardware (in my case, the PolarFire SoC) to a specific implementation (and therefore a specific datasheet) of the U54-MC ? (Microchip/Microsemi only mention “U54-MC”, nothing more).

(Side note : It might also explain why the behavior of the branch predictor mode CSR appears to be the opposite of what is described in the datasheet)