I found this doc from Sifive/Starfive that explains the L2 prefetcher in detail in U74-MC (previous Gen):
See Chapter 13.2.5
Now we have private L1/L2 and shared L3, but I assume some terminology still applies. It’s far more readable than the TRM released by ESWIN.
drmpeg’s code suffered significant perf regression – took ~2x the time to finish. I did some tweak and found that you don’t need that many tweaks to L1/L2 prefetcher to boost the STREAM workload and, in the meantime, penalize drmpeg’s LDPC workload. Based on the original value of CSR 0x7c3 and 0x7c4 before patch hifive-premier-p550: opensbi: Modify CSR registers · sifiveinc/meta-sifive@9759264 · GitHub, increase maxL1PFDist a little bit is all you need to boost STREAM performance. I tried to set it to 2 or 3, and I got pretty good STREAM perf (on par with the new firmware release), and didn’t see noticeable regression with LDPC. ESWIN/Sifive needs to do more testing to make L1/L2 prefetcher settings fitting wider range of workloads. Even better, provide a SBI interface so it can be adjusted without having to flash a new firmware.
FYI: My current setting:
0x7c3: 0x1005c1be649 {
"reg": "0x7c3",
"name": "L1 Prefetcher CSR",
"fields": {
"l1pfEnable": 1,
"window": 36,
"initialDist": 12,
"maxAllowedDist": 31,
"linToExpThrd": 3,
"qFullnessThrdL1": 14,
"hitCacheThrdL1": 2,
"hitMSHRThrdL1": 0,
"issueBubble": 0,
"maxL1PFDist": 2,
"forgiveThrd": 0,
"numL1PFIssQEnt": 0
}
}
0x7c4: 0x929f {
"reg": "0x7c4",
"name": "L1 Prefetcher CSR",
"fields": {
"l2pfEnable": 1,
"qFullnessThrdL2": 15,
"hitCacheThrdL2": 20,
"hitMSHRThrdL2": 4,
"numL2PFIssQEnt": 2
}
}