GPIO Loopback


(MIke Field) #1

While learning how the GPIO works I did a quick test of how fast a signal can get between GPIO pins oh the Hifive1 (including through the level shifters).

Rough timing is about 0.125us per transition (8,000,000 transitions per second).

Really brain-dead code below if anybody is interested - next up it so see how quick it will work using interrupt on change.

PS. Don’t forget to add a jumper wire between Digital I/O 8 and Digital I/O 9 - (GPIO0 and GPIO1).

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include "platform.h"


volatile unsigned int* g_output_vals  = (unsigned int *) (GPIO_BASE_ADDR + GPIO_OUTPUT_VAL);
volatile unsigned int* g_input_vals   = (unsigned int *) (GPIO_BASE_ADDR + GPIO_INPUT_VAL);
volatile unsigned int* g_output_en    = (unsigned int *) (GPIO_BASE_ADDR + GPIO_OUTPUT_EN);
volatile unsigned int* g_pullup_en    = (unsigned int *) (GPIO_BASE_ADDR + GPIO_PULLUP_EN);
volatile unsigned int* g_input_en     = (unsigned int *) (GPIO_BASE_ADDR + GPIO_INPUT_EN);

int main(int argc, char **argv)
{
  int i;
  /* Set up GPIO 1 as output */
  *g_input_en   &= ~2;
  *g_output_en  |= 2;

  /* Set up GPIO 0 as input */
  *g_output_en  &= ~1;
  *g_input_en   |= 1;


  write (STDOUT_FILENO, "Start\n", 6);
  for(i=0;i<100000000;i++) {
    /* Clear GPIO1 */
    *g_output_vals &= ~2;

    /* Wait for it to appear on the input */
    while(*g_input_vals & 1)
       ;

    /* set GPIO1 */
    *g_output_vals |= 2;

    /* Wait for it to appear on the input */
    while(!(*g_input_vals & 1))
       ;
  }
  write (STDOUT_FILENO, "DOne\n", 5);

  return 0;

}

Timings for drawing VGA graphics
(Bruce Hoult) #2

So this code is taking 25 seconds to run?

What MHz are you running the CPU at?


(MIke Field) #3

The bootloader default… 250MHz or so (am away from the board at the moment). (update - bootloader showing 266305536 Hz)

But it isn’t the chips fault. It does have two level shifters and the low speed 0.1" headers to get through. I am sure if I hotwire the two gpios where they go into the level shifters it will be much quicker…


(MIke Field) #4

Humm - shorting the pin on the input to the level shifters made minimal difference in the time taken - I didn’t have enough hands free to start/stop a stopwatch, but it was close to 26 seconds watching a clock. Not what I was expecting!


(Bruce Hoult) #5

I’d have thought such things would have an effect on the nS level at most, not uS?


(Donnie Agema) #6

OP might have been better placed here Speed of the E31 at its I/O pins


(MIke Field) #7

@dagema - Yep, it is quite possibly the wrong place in retrospect, but at the time I was testing the speed of the complete board (level shifters and all), not the raw capabilities of the chip, and it is the complete loop (out and back), not just how fast you can waggle a single output pin.

@Brucehoult. I found the levell shifter’s datasheet at http://www.datasheetq.com/pdf/TI/671732.pdf. At 3.3V the delay is under 5ns for push/pull driving in either direction, or as long as 450ns for open drain low to high transition.


(Bruce Hoult) #8

Yup, and I posted a link to much the same document directly at TI http://www.ti.com/lit/ds/symlink/txs0108e.pdf a few days ago… Low output voltage?


(MIke Field) #9

I’ve got that “just walked into the wrong sort of bar and ordered a drink” feeling coming over me. :slight_smile:.

Is this forum for ‘generic hobbyist users’ like me who owns a HiFive1 to discuss what they are up to, and learn more about how HiFive1 / RISC-V can improve their lot?

Or is it for feedback to SiFive1 for technical issues on the chip/board, and for ‘sharing the experience’ threads I should go elsewhere?


(Megan A. Wachs) #10

The forums are for any and everything RISC-V / SiFive related! All experience levels welcome.

Thanks for running this experiment! Another experiment you could do is to determine how much is code overhead and how much is the actual delay on the GPIO. You could replace the check in your code for the input of 1 or 0 with a check on something you know is true (e.g. you could check the *g_output_en which is set to 1). And see how quick the code runs when you assume the IOs are infinitely fast.

We could also do a RTL simulation of your code and see the “theoretical” maximum.


(Bruce Hoult) #11

Both I hope :slight_smile:

I’m one of the three most dangerous things in the world: a programmer with a soldering iron!

I use small ARM boards such as Pi, Odroid XU-4 and C2 at work to prototype low level code that will be on phones later in a full friendly Linux environment, much faster than qemu. I’ve built one or two small things with Arduino at home (with protoshields of my own design), but definitely only hobby-level at hardware.

I hope RISC-V will feature professionally in future, but right now it’s just a hobby at home – or will be once my Founder’s Edition actually arrives! Chomping at the bit here and envious of you…

Can’t wait for the day I can get a Pi level board running Linux for a tad less than the $3500 for an FPGA one.


(MIke Field) #12

Thanks Megan for the encouragement.

It looks like the standard C “read / apply mask / write” operation used to flip bits in the GPIO register is relatively expensive - you can only do about 14M of these per second. Replacing these a simple assignments speed things up by about 2x, so it looks like a register access from C takes about 35ns.(unit corrected - thanks Douglas!).

Here are the various things I tried:

  1. Checking of the input replaced with checking a different register, with known value:
    27 seconds (No change from original) - about 7.5M transitions per second

  2. Removing the checking of the GPIO input completely, just toggling pins using bit masking operations
    18 seconds - about 11.1M transitions per second

  3. Change masking operations:that are within the loop

    *output_reg |= 2
    *output_reg &= ~2

with computing the high and low values outside the loops:

 high = *output_reg | 2;
 low  = *output_reg & ~2;

and then assignments inside the loop, to replace read/mask/write operations with just a write.

 ...
 *output_reg = high;
 *output_reg = low;
 ...

7 seconds - about 28.1M transitions per second

  1. Then went back to monitoring the GPIO input:
    16.5 seconds - 12.1M transitions per second

This is the code for the final test:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include "platform.h"

volatile uint32_t * g_output_vals  = (uint32_t *) (GPIO_BASE_ADDR + GPIO_OUTPUT_VAL);
volatile uint32_t * g_input_vals   = (uint32_t *) (GPIO_BASE_ADDR + GPIO_INPUT_VAL);
volatile uint32_t * g_output_en    = (uint32_t *) (GPIO_BASE_ADDR + GPIO_OUTPUT_EN);
volatile uint32_t * g_pullup_en    = (uint32_t *) (GPIO_BASE_ADDR + GPIO_PULLUP_EN);
volatile uint32_t * g_input_en     = (uint32_t *) (GPIO_BASE_ADDR + GPIO_INPUT_EN);

int main(int argc, char **argv)
{
  uint32_t i;
  uint32_t high,low;

  /* Set up GPIO 1 as output */
  *g_input_en   &= ~2;
  *g_output_en  |= 2;
 
  /* Set up GPIO 0 as input */
  *g_output_en  &= ~1;
  *g_input_en   |= 1;

  /* Pre-calculate the GPIO output register value */
  low   = *g_output_vals & ~2;
  high  = *g_output_vals | 2;

  write (STDOUT_FILENO, "Start\n", 6);

  for(i=0;i<100000000;i++) {
     /* Clear GPIO1 and wait for it to appear on the input */
     *g_output_vals = low;
     while(*g_input_vals & 1)
       ;

    /* set GPIO1 and wait for it to appear on the input */
     *g_output_vals = high;
     while(!(*g_input_vals & 1))
       ;
   }
  write (STDOUT_FILENO, "Done\n", 5);
  return 0;
}

(Bruce Hoult) #13

“so it looks like a register access from C takes about 35us”

ns. So about nine clock cycles if you’ve still got the same clock frequency. That’s not too bad.


(Dave) #14

That’s the explicitly stated goal of the lowRISC chaps.

It’s some way off, I’m sure, but the big milestone I’m waiting for is a RISC-V SoC which includes a GPU capable of, say, 1080p and some level of GL acceleration.


(Bruce Hoult) #15

Yes, I’ve been following Alex Bradbury’s activities (and twitter) for several years. Can’t wait.

GPU would be nice, of course, but I’m happy with something I can ssh into (and remote X).


(Donnie Agema) #16

Do you mean something like this:

https://shop.trenz-electronic.de/en/27229-Bundle-ZynqBerry-512MB-SDSoC-Voucher-only-while-stocks-last


(Bruce Hoult) #17

What instruction sets can you program into that? How many cores? How many MHz?

It’s three times the price of a quad core Raspberry Pi or Odroid C2 (quad core Aarch64, 1.5 - 2.0 GHz, 2 GB RAM, gigE, GPU, eMMC and UHS-1 micro SD) so it has better be pretty good!


(Dave) #18

I wonder how much of the price is the Xilinx tool license? This part has me a little perplexed:

1 x SDSoC-Zynq Development voucher from Xilinx (only while stocks last)

What happens when they run out of licenses but have hardware left to ship? Is it then only useful to people who already have purchased the software separately?

Behaviour like this is one of the reasons I’m so excited about what SiFive are doing. It’s meaningless to have “open” hardware if the tools and documentation to actually make use of it are so hostile to the user.


(Donnie Agema) #19

Licenses are available separately for $10 by Digilent.


(Donnie Agema) #20

Nope, not what you’re look’n for!