Is it possible to run a neural network on the HiFive1?

I was asked to run and test a 5-layer neural network on this board. After a few tries, this doesn’t sound very feasible to me considering the 16KB of RAM available. The input to the network is about 100 KB in size, and I don’t think it is possible to write to flash memory while the program is running on the board. Even if it was, it will probably be extremely slow and has to be done in tiny chunks.

Is there something I’m missing here? Is there a practical way to use the flash memory for storing intermediate computation results?

Thank you for your help

You are correct that it would be extremely slow to write/read intermediate results to/from flash. How many neurons are in your network? What performance and arithmetic do you need (double/float/integer)? This might be a better fit for the HiFive Unleashed optionally with an attached FPGA accelerator or a future RISC-V machine equipped with Vector extensions.

It has 2 conovlutional layers (5x5x3x32 + 5x5x32x32 parameters) followed by 3 fully-connected layers (100x18432 + 100x100 + 100x4). Real-time (20-30 fps) would be preferred and with integer precision. It is a relatively small model, but it still seems a bit too big for the HiFive1. We might look into those other boards you mentioned.

So, the bottleneck is going to be the erase performance of the flash. The IS25LP128 datasheet states that it can erase the entire chip (128Mbit) in 30 seconds = 4369kbit/s, typical. Your first fully-connected layer receives 148kbit, assuming 32-bit precision. At 30fps, that’s 4440kbit/s you would need to push, just for the input vector.

Fortunately, your matrix coefficients do not change, so their up-to 100* bandwidth don’t have to fit into the erase budget. Still, I don’t think this is feasible.

Going with those numbers, the whole network would need at least 3.3 seconds (optimistically) of flash writes to fully process one input, not including reads from flash or computation time. Up to 5 seconds per frame might be acceptable though, so I would just like to test it, but I can’t find any info on how to write to flash memory at runtime.

From what I found, it sounds like writing to flash at runtime is a hack and that you have to write your own bootloader to allow it.

Everything about microcontroller programming is a hack. I don’t think loading your program into the SRAM to use the flash for bulk storage is particularly worse than most other things people do with microcontrollers. Anyway, your use-case is going to require that intermediate results are stored in something that can be rewritten faster than SPI/flash.

That said, it would be good if SiFive provided sample code for how to erase a sector of the flash and how to write new data into flash – complete with copying the necessary code into SRAM, disabling flash memory mapping, doing the actual SPI transactions (which is the only part that is well documented), and then re-enabling flash memory mapping.

@mhamrick asked me on Tuesday whether I’d written code to do this. I’ve thought about it a few times over the last 18 months, but it’s pretty scary stuff and I kept hoping @mwachs5 or someone would do it…

The SiFive SPI controller documentation covers how to turn on/off the memory mapped interface and how to issue general SPI commands. You can also look at the linux device driver for example code. The datasheet for the IS25LP128 includes the SPI commands you would need to erase and program.

or we could just do it and make the code available to people interested in this use case.

1 Like

there are also SPI or parallel external SRAM chips if you don’t mind adding hardware. I doubt they’re slower than the SPI interface speed, and typical values for max frequency i’ve seen is 20-25MHz, so that’s probably going to be a touch faster than flash.

That is quite far from having source code for a C function that takes a pointer to some bytes, a length, and the address you’d like it to show up in the address space later…

The flash chip we use is supposed to be able to do 66 MB/sec doing quad SPI at 133 MHz. Raw, of course, so less than that of actual useful data. But it’s not bad. We’re just not driving it anywhere near that quickly.

Hi, regarding flash r/w code for the fe310 I recently found something in the source code from Mynewt:


Of course the Mynewt dependencies must be removed to use it standalone.
I have also only seen it and not verified anything

I did run LeNet-5 neural network(heavily memory optimized) on HiFive1 even with arducam camera. GitHub - hasanunlu/neural_network_deployment_for_uC: Efficient neural network deployment for uC using pytorch model