Fast Methods for Toggling GPIO Pins

This code will toggle the Arduino pins 8 through 13 every ~29ns:

int main(int argc, char **argv)
{
  // Set up the GPIOs such that the LED GPIO
  // can be used as both Inputs and Outputs.

  const unsigned int mask = 0x3f;

  GPIO_REG(GPIO_INPUT_EN)  &= ~mask;
  GPIO_REG(GPIO_PULLUP_EN) &= ~mask;
  GPIO_REG(GPIO_OUTPUT_EN) |=  mask;

  unsigned int base = GPIO_REG(GPIO_OUTPUT_VAL) & ~mask;

  while(1) {
    GPIO_REG(GPIO_OUTPUT_VAL) = base;
    base ^= mask;
  }

  return 0;

}

Any ideas on how to manipulate GPIO pins faster. (Unrolling the loop didn’t seem to help.)
This seems to be the resulting assembly code:

204001d4:	00f72623          	sw	a5,12(a4)
204001d8:	03f7c793          	xori	a5,a5,63
204001dc:	ff9ff06f          	j	204001d4 <main+0x34>

If your application’s pin layout allows what about using the PWM module?

mwachs5Megan A. WachsVerified SiFive Account
19d

There are logical constraints, and physical.

Logically, if you can use the PWM or other HW to drive the pins, then you could achieve CPU frequency / 2. If you have to bit bang with software, the fastest you can achieve is CPU frequency / 15 (using atomics).

Physically, the FE310’s I/Os were constrained at 100MHz, and the Arduino headers aren’t designed for high speed I/O.

Practically, we are easily bit-banging the control for WS2812 LEDs (aka NeoPixels) which have real-time requirements of ± 150 ns with just C code loops (no hand-crafted assembly).

Using an amoxor instruction will reduce the instruction count in loop. At what CPU frequency are you running?

Hi Krste, can you please kindly provide a snippet on how to use amoxor to toggle GPIO pins? Thanks in advance.

Instead of…

while(1) {
  *ptr = base;
  base ^= mask;
}

… I think this would suffice as a direct replacement …

while(1) {
  asm("amoxor.w x0, %0,0(%1)" : : "r"(mask), "r"(ptr) );
}

Or you could replace x0 with base if you want to be able to keep track of the current state.

1 Like

Hi, all,

I tried the code Steven Burs posted, but the output pins do nothing UNTIL I add some other code inside the loop, e.g., delay(1).

Has anyone else seen this behavior? Any idea how to avoid/correct it? (I had thought the GPIO_REG definition wasn’t volatile and the loop code was being “optimized to nothing”, but the GPIO_REG is defined properly).

Joe

Thank you Bruce. The example you provided worked wonder for me.

I was able to toggle bit once every ~600ns with amoxor at 16MHz clock speed. However, if I were to compute the hi and low value before start toggling and just do lw and sw, I get ~500ns per toggle (assuming irq is disabled). In that case, amoxor instruction does not seem as efficient…

edit: i have found that sometimes doing sw %[hi] 0(%[gpio]) and sw %[low] 0(%[gpio]) back to back yields unstable results. between 500 ~ 700ns.

Can you send me the whole program please