Is it possible to run a neural network on the HiFive1?

Valentin9265 · June 13, 2018, 12:32pm

I was asked to run and test a 5-layer neural network on this board. After a few tries, this doesn’t sound very feasible to me considering the 16KB of RAM available. The input to the network is about 100 KB in size, and I don’t think it is possible to write to flash memory while the program is running on the board. Even if it was, it will probably be extremely slow and has to be done in tiny chunks.

Is there something I’m missing here? Is there a practical way to use the flash memory for storing intermediate computation results?

Thank you for your help

terpstra · June 13, 2018, 1:24pm

You are correct that it would be extremely slow to write/read intermediate results to/from flash. How many neurons are in your network? What performance and arithmetic do you need (double/float/integer)? This might be a better fit for the HiFive Unleashed optionally with an attached FPGA accelerator or a future RISC-V machine equipped with Vector extensions.

Valentin9265 · June 13, 2018, 1:50pm

It has 2 conovlutional layers (5x5x3x32 + 5x5x32x32 parameters) followed by 3 fully-connected layers (100x18432 + 100x100 + 100x4). Real-time (20-30 fps) would be preferred and with integer precision. It is a relatively small model, but it still seems a bit too big for the HiFive1. We might look into those other boards you mentioned.

terpstra · June 13, 2018, 2:29pm

So, the bottleneck is going to be the erase performance of the flash. The IS25LP128 datasheet states that it can erase the entire chip (128Mbit) in 30 seconds = 4369kbit/s, typical. Your first fully-connected layer receives 148kbit, assuming 32-bit precision. At 30fps, that’s 4440kbit/s you would need to push, just for the input vector.

Fortunately, your matrix coefficients do not change, so their up-to 100* bandwidth don’t have to fit into the erase budget. Still, I don’t think this is feasible.

Valentin9265 · June 14, 2018, 7:10am

Going with those numbers, the whole network would need at least 3.3 seconds (optimistically) of flash writes to fully process one input, not including reads from flash or computation time. Up to 5 seconds per frame might be acceptable though, so I would just like to test it, but I can’t find any info on how to write to flash memory at runtime.

From what I found, it sounds like writing to flash at runtime is a hack and that you have to write your own bootloader to allow it.

terpstra · June 14, 2018, 9:10am

Everything about microcontroller programming is a hack. I don’t think loading your program into the SRAM to use the flash for bulk storage is particularly worse than most other things people do with microcontrollers. Anyway, your use-case is going to require that intermediate results are stored in something that can be rewritten faster than SPI/flash.

bruce · June 14, 2018, 4:47pm

That said, it would be good if SiFive provided sample code for how to erase a sector of the flash and how to write new data into flash – complete with copying the necessary code into SRAM, disabling flash memory mapping, doing the actual SPI transactions (which is the only part that is well documented), and then re-enabling flash memory mapping.

@mhamrick asked me on Tuesday whether I’d written code to do this. I’ve thought about it a few times over the last 18 months, but it’s pretty scary stuff and I kept hoping @mwachs5 or someone would do it…

terpstra · June 14, 2018, 5:07pm

The SiFive SPI controller documentation covers how to turn on/off the memory mapped interface and how to issue general SPI commands. You can also look at the linux device driver for example code. The datasheet for the IS25LP128 includes the SPI commands you would need to erase and program.

mhamrick · June 14, 2018, 5:35pm

or we could just do it and make the code available to people interested in this use case.

mhamrick · June 14, 2018, 5:46pm

there are also SPI or parallel external SRAM chips if you don’t mind adding hardware. I doubt they’re slower than the SPI interface speed, and typical values for max frequency i’ve seen is 20-25MHz, so that’s probably going to be a touch faster than flash.

bruce · June 14, 2018, 5:47pm

That is quite far from having source code for a C function that takes a pointer to some bytes, a length, and the address you’d like it to show up in the address space later…

bruce · June 14, 2018, 5:53pm

The flash chip we use is supposed to be able to do 66 MB/sec doing quad SPI at 133 MHz. Raw, of course, so less than that of actual useful data. But it’s not bad. We’re just not driving it anywhere near that quickly.

thornschuh · June 14, 2018, 6:42pm

Hi, regarding flash r/w code for the fe310 I recently found something in the source code from Mynewt:

github.com

apache/mynewt-core/blob/master/hw/mcu/sifive/fe310/src/hal_flash.c

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

#include <string.h>

This file has been truncated. show original

Of course the Mynewt dependencies must be removed to use it standalone.
I have also only seen it and not verified anything

hasanunlu9 · June 25, 2020, 5:21pm

I did run LeNet-5 neural network(heavily memory optimized) on HiFive1 even with arducam camera. GitHub - hasanunlu/neural_network_deployment_for_uC: Efficient neural network deployment for uC using pytorch model

Topic		Replies	Views
Connecting a camera HiFive1 Rev B	22	4895	January 27, 2017
Flash memory access HiFive1 Rev B	1	1905	August 13, 2021
Extending the HiFive1 board with additional RAM & disk? HiFive1 Rev B	2	3090	March 18, 2019
Writing to the External QSPI Flash Memory HiFive1 Rev B	5	1619	July 11, 2024
HiFive 1 Arduino performance HiFive1 Rev B	8	3490	February 24, 2017

Is it possible to run a neural network on the HiFive1?

Related topics