I purchased a Samsung 970 EVO Plus 500G MZ-V7S500B/AM about a year ago. The Sifive documentation claims the unmatched board supports a Samsung 970 EVO Plus 500G MZ-V7S550B/AM so this looks like a typographic error.
The SSD started generating Input/output errors within the past couple weeks and seems to automatically turn off into safe mode (read-only). The machine and the SSD have been unused for a couple months. It wasn’t till about last fall where I started really using the system on a regular basis.
So far I have…
tweaked the kernel parameters for nvme (turned off APST)
run smartctl without error
run demsg without errors
checked the kernel logs and saw no errors
tweaked kernel parameters to run fschk and force fschk; fschk fixed a filesystem organizational issue to optimize performance.
performed a dd of the nvme’s content into /dev/null without error
The errors seem to occur when I run cmake and let it try to download packages and cache them for complex builds.
Any advice on what to do or try next would be appreciated. I’m leaning toward purchasing a new NVMe SSD.
DRAM timing issues is a possibility. This is a known problem with some original Unmatched boards, and possibly also with the newer Unmatched Rev B boards. The DRAM speed is set in uboot, and the default choice recommended by SiFive doesn’t work reliably for all boards. If you are having DRAM timing issues, you will get weird failures. Sometimes a program crashes for no obvious reason. And sometimes you get an input/output error. If your board is susceptible to this problem, it tends to get worse under load, such as when doing a parallel build. If this is the problem, then the solution is to decrease the DRAM speed. I believe one of the linux distros, Debian I think, deliberately runs the board with a slower DRAM speed than recommended by SiFive to make them more reliable.
I’ve definitely seen parallel program builds break for no reason. Do you have any documentation (links) on how to configure this option in uboot? I’m currently running ubuntu’s sifive unmatched rev b distribution. thanks in advance
Thank you @JimWilson@drmpeg for the help, assistance, and support - unfortunately it looks like the device is still continuing to generate Input/output errors.