How is the HiFive1 core configured?

The specs page says RV32IMAC. 16 KB SRAM, 16 KB 2-way associative instruction cache. 128 Mbit (16 MB) flash. Great.

It says the flash is SPI. So it’s only available for serial access? Or hardware transparently maps it into the address space? I see 0x20000000 - 0x7FFFFFFF is reserved by E31 for such purposes.

Is User Mode (and MMU) implemented?

What does the instruction cache cache? i.e. where is it loaded from on a cache miss? E31 Coreplex manual says the optional data/instruction caches reload from the TileLink. So … from the flash memory, then?

The external flash does indeed have a SPI interface. For read/instruction fetch accesses, hardware does transparently map it into the address space, starting at 0x20000000. Our Freedom E SDK and (soon to be released) Arduino flows load programs at 0x20400000 (leaving the lower addresses for bootloaders, FPGA images, etc). In this memory-mapped mode the user is allowed/responsible for setting the access parameters, such as the Clock divider and operation mode, though the reset defaults would work with a wide variety of SPI Flash chips.

For SPI Flash writes, you have to use the SPI register interface to issue the erase & write commands and stream in data.

In E31 Coreplex, there is no User mode or MMU. Everything is Machine mode.

The ICache caches pretty much caches everything – Data Scratchpad, SPI Flash, OTP Memory, Mask ROM, Gate ROM, even Debug RAM and ROM. All of these are accessed across Tile Link.

Thanks Megan!

So reads to the mapped flash are pretty slow? Address data sent for every word read? You can use the SPI registers to stream reads to data scratch faster?

Eager to get my hands on one of these :slight_smile: (I got one of the last of the “Founder’s Edition”).

Flash is very slow compared to the processor clock speed. You won’t be able to load it any faster by using the SPI register, though. The memory mapped interface already converts sequential accesses into a streaming quad-SPI access.

I wasn’t thinking about the processor clock speed, which is of course vastly faster. I was wondering about how many address and control bits have to go over the slow serial SPI for every data bit.

I’d expect instruction cache fills stream the 32 bytes, but there is no data cache. So each data word read from mapped flash needs its own transacation?

Ah. Sequentially accessed addresses should be combined into a single SPI command, sharing the address bits and dummy cycles/etc. I’m not certain this has been tested for the data bus, but I’d expect that memcpy from qspi would do the right thing.