Serial I/O Paradigm -- being amoorous

A two-step process is common practice of serial data transfer: first transmit data down the wire; then receive data back from the wire. After transmitting, wait to make sure data actually goes out. Before receiving, wait to make sure data actually comes back in. Not considered here is any dwell necessary between successful transmit and beginning of receive; that is usually arbitrated with other (out-of-band) signaling from slow targets or long-running tasks requested by the serial host.

The return value rd from atomic amoxxx instructions is the result of reading memory before the operation. Is this subtle point significant and important when using atomic operations with external and asynchronous hardware peripherals?

For example, amoor.w rd, rs2, (rs1) performs a trio of steps internally and without interruption: (a) read; (b) modify and write; and (c) return.

a. t = M(rs1)
b. M(rs1) = t | rs2
c. rd = t

amoor.w is suggested as an ideal way to write only when ready. My question is, what happens if the time between the internal mem-read (a) and mem-write (b) steps of an atomic operation exceeds the the time necessary to successfully transmit and clear txdata status, especially for slow-speed serial transfer rates where t(sclk) < (fifo_sz * tlclk)? In other words, after a mem-read which shows “not ready yet”, the txdata device may become ready for which the subsequent internal mem-write (b) succeeds; thus giving a false indication by the atomic operations return message (c), which shows failure to transmit.

An example with the SPI block is shown below, which applies similarly to UART and I2C.

ser_transfer:
  lui t0, SPI1_BASE
  addi t2, t0, SPI1_TXDATA

wait_tx_xmit:
  amoor.w t1, a0, (t2)     # lw t1, SPI1_TXDATA(t0)
  bnez t1, wait_tx_xmit    # bnez t1, wait_tx_xmit
#         ^                # sw a0, SPI1_TXDATA(t0)
#         |                           ^
#         |                        BETTER?
#         |                           |
#         |<----------- ? ----------->|
#  time between mem-read and mem-write may be
#  insufficient to clear tx fifo

wait_rx_rcv:
  lw a0, SPI1_RXDATA(t0)
  srli t1, a0, 31          # lw t1, (1<<(20-1))
  bnez t1, wait_rx_rcv     # and t1, t1, a0
#         ^                # bnez t1, wait_rx_rcv
#         |                           ^
#         |<----------- ? ----------->|
#  which is better?
#  simetimes hangs with 0x8000 0000 always in a0

  andi a0, a0, 0x7F  # [7:0]
  ret

Mentioned in @benno’s earlier discussion (Amoswap on uart.txdata register) is the highly relevant point that “… writes to the txdata register with the txfull bit set will quash the sending of the byte (effectively making it so that write is ignored by the device).” Not sure whether an FE310-Gxxx part implements this, and in general, what might be a set of best practice cases when to – and not to – use amoxxx operations.