L2CACHE: DataError @ 0x00000018.CCD1D0B0

Hi,
I just received my HiFive Unmatched - awesome!

Right now I am using the OpenEmbedded built kernel and dtb but a Debian rootfs from NVME. Works like a charm :slight_smile:
But while browsing through dmesg after power on I see this:

...
[    2.267931] L2CACHE: DataError @ 0x00000018.CCD1D0B0
[    2.267985] L2CACHE: No. of Banks in the cache: 4
[    2.277316] L2CACHE: No. of ways per bank: 16
[    2.281657] L2CACHE: Sets per bank: 512
[    2.285478] L2CACHE: Bytes per cache block: 64
[    2.289908] L2CACHE: Index of the largest way enabled: 15
...

This happens with the older 5.11.10 kernel it came with and with the more recent 5.12.4 from Github.

Is this a known issue? And (hopefully) normal?
Or does my CPU indeed have defect in the L2 cache? The latter would be a pity :frowning:

Thanks!

Cheers
nicole

2 Likes

As far as I remember it always existed since FU540 (thus for years). This happens right after the driver hooks up the interrupt handler before quering L2 cache information. You shouldn’t worry about it. DataError most likely is also correctable error and thus does not result in kernel panic. The uncorrectable error would result in a kernel panic.

Excellent, thank you very much for the fast response!
Which is also actually a great relief :slight_smile:

Cheers
nicole

Thanks a lot for posting this, I also noticed it but was too slow to ask about it.

While searching about this error, I found the 4 types (both in the chip’s spec and in the kernel L2 cache driver):

  • DirError: cache metadata error, correctable by ECC
  • DirFail: cache medata error, detected by ECC but non-correctable
  • DataError: cache data error, correctable by ECC
  • DataFail: cache data error, detected by ECC but non-correctable

I also found that the ZSBL does wipe the L2 cache early on, precisely so such error should not happen… It does not write to Way0, as per the specification, so could Way0 be causing this error ? Or could the error be just some stale non-cleaned-up detection from earlier in the boot, that the kernel happens to find when it starts looking for it ?

On vanilla 5.13-rc6 kernel I am seeing a different behaviour: on a coldboot; there is a constant spam of DataError (with apparently random addresses on each boot, but always the same address on every message on that boot), probably preventing the boot from progressing. But after a reset, there is no DataError at all.

…which is fixed by this devicetree patch from meta-sifive repository. And I guess the reason why I am not getting it on every boot is that my kernel is failing to clear the condition, which prevents the IRQ fro, firing entirely after a reset. This is the risk I take by using vanilla: it’s my responsibility to backport the right stuff.