How is qspi flashed?

I am planning to replace the qspi flash with an on chip flash memory for a tapeout. So I am trying to understand how the qspi is flashed. Is it correct to say that the qspi is flashed through the JTAG module ?
Are there guidelines to read so that I can understand how the qspi flashing works ?

I assume you are talking about on the HiFive1? (the answer is sort of the same for the FPGA Dev Kits, but I am answering as if you are talking about the HiFIve1).

In the SiFive flows, the QSPI is flashed by the processor running code directed by the debugger. So, the JTAG connects to the debug interface, and the debugger (OpenOCD) tells the core to drive the QSPI interface through the peripheral just as if the processor was running a program to write the SPI Flash.

You can see this code here:

And you can read about the OpenOCD/debugger side of it here:

1 Like

Thanks very much for the recommended material I will look into them.

I thought that for the processor to run a code it has to be loaded first in the flash, then from flash it would be loaded in the SRAM and from then the process executes those instructions. How is the processor able to execute a code while flash and consequently SRAMs have no instructions loaded ?

In my tapeout I am planning to flash the memory using FT2232, like HiFive1. I wanted to try that on Arty FPGA using HiFive1 script after loading the mcs file of the coreplex into the FPGA fabric but I get Error: no device found
Error: unable to open ftdi device with vid 0403, pid 6010, description ‘Dual RS232-HS’, serial ‘’ at bus location '’. it seems to me that since FT2232 is available on Arty FPGA it should be detected. Should I modify the JTAG connection in freedom/fpga/e300artydevkit/src/system.v to connect to FT2232 ? if so is it the only modification I should do ?

Of course that is not true. The processor can execute code from anywhere in the address space. It could be in tables generated in Verilog, mask ROM on the SoC, one-time-programmable flash on the SoC, the scratchpad RAM on the SoC (set up by JTAG commands)…

How is the processor able to execute a code while flash and consequently SRAMs have no instructions loaded

You may want to take a look at the RISC-V Debug Spec to understand how it works. The FE310 has a little debug buffer memory from which it can execute code, the debugger uses that to read/write the SPI control registers, and load a program into SRAM , which in turn runs and programs the SPI Flash.

it seems to me that since FT2232 is available on Arty FPGA it should be detected.

To program the flash on the Arty FPGA, you need to use an external debugger connected to PMOD header JD as explained in the . HiFive1 works because there is a dedicated FT2232 on the board. To get the same behavior as the HiFIve1, you need a debugger that has the FT2232 chip in it. Either way, you should be using the E300 Arty Dev Kit or Coreplex IP FPGA Dev Kit configuration file, not the HiFive1 file.

There is an FT2232 chip on the Arty board, but it’s not currently connected to the FPGA in a useful way to use as the debugger interface. You’re welcome to try modifying the system.v to get it to work, and we’d be interested to hear how it goes!

1 Like

Thank you for the clarifications.

I have checked the Arty schematics and the symbol of FT2232 wasn’t part of it. on the Xilinx forum they claimed that they omitted it because it’s proprietary. which I thought is a bit weird because sifiive disclosed the full schematics of HiFive1 which includes FT2232 connection to coreplex. On the forum, they have mentioned that only USB-UART can be used on A9 and D10 pins for Rx and Tx. So basically if JTAG pins of FT2232 are not connected to the ARTIX-7 FPGA, then it is not possible to use it as a debugger. Also not knowing the variables’ names to use as IO in system.v makes it even harder to give it a try.

@kimokono, yes, we went through a very similar exploration. All of those reasons are exactly why we went with the PMOD D connector rather than trying to make use of the included FT2232 chip.

Is the firmware, stored in qspi, loaded into the FE31’s instruction buffer using the “default_flash_read” in the following struct definition in openocd/src/flash/nor/fespi.c

struct flash_driver fespi_flash = {
.name = “fespi”,
.flash_bank_command = fespi_flash_bank_command,
.erase = fespi_erase,
.protect = fespi_protect,
.write = fespi_write,
.read = default_flash_read,
.probe = fespi_probe,
.auto_probe = fespi_auto_probe,
.erase_check = default_flash_blank_check,
.protect_check = fespi_protect_check,
.info = get_fespi_info

Is it the case that the same code that is loaded in the debug buffer memory then SRAM performs the “write firmware from PC host to flash” and also performs “read firmware from flash to FE31 instruction buffer” ? if not what is the mechanism that performs “read firmware from flash to FE31 instruction buffer” ?

Any program code (firmware or user code) that is stored in the QSPI flash is fetched by the FE310-G000’s instruction cache. Simply by executing code to the address stored at the address mapped to the QSPI Flash memory region, the hardware handles reading in the data, storing it in the I-Cache, and executing it. This has nothing to do with OpenOCD or the debug interface. If you do a lw from the SPIFlash region, the data is similarly read in to the register file automatically.

It is not the same code that does the writing as does the reading. To do a read, we simply have to read the memory location. To do a write, we have to manipulate the Quad SPI Peripheral registers in order to do the write (you can’t just write to the memory location). That is the code that is in fespi_write. The default_flash_read just reads memory, it actually doesn’t do anything “flash” related really:

I shouldn’t have thought that doing a lw would put the data into the instruction cache. My experience is that every lw from flash takes on the order of 1 us.

@brucehoult you are correct. I will edit my response.

To perform a read from the QSPI I think a read command should be sent through SPI master on FE310-G000 to the QSPI peripheral registers as well. So does that mean an “lw” to the QSPI address is programmed in the decode stage to control the SPI master to initiate a read command ?

To perform a read, the QSPI registers are not needed. The QSPI region is memory mapped, so the usual “target_read_memory” command is used to read th QSPI memory. It’s a nontrivial function that does a lot of things, but ultimately it does write a lw into the Program Buffer, and stores the result back into the Program Buffer for retrieval by the debugger.

So if I understood correctly, “target_read_memory” function which is implemented in the program initially loaded by the OpenOCD, performs “lw” that actually performs the qspi reading ?

So basically, disregarding the debugging functionality, OpenOCD loads a program that performs “write into flash” and that same program has the read functionality that is called only when the processor wants to load instructions from flash to I-Cache to fetch them and execute them. am I right ?

Concerning the comment “To perform a read, the QSPI registers are not needed”, in the In the data sheet of the qspi flash IS25LP128 it was mentioned that in order to perform a normal read for example, a NORD instruction with a read command in byte0 has to be sent to the qspi along with the address in byte1,2 and 3. Thus I am not sure how it is possible to read data without qspi registers.