Equivalent RISC-V asm code


(Daniel kirubakaran) #1

What could be the equivalent RISC-V asm code for the following asm code snippet?
And also i am very curious to know the exact meaning of this code, what it actually does?

i do not know asm coding that is why!

void tunedDelay(uint16_t delay) {
uint8_t tmp=0;

asm volatile(“sbiw %0, 0x01 \n\t”
“ldi %1, 0xFF \n\t”
“cpi %A0, 0xFF \n\t”
“cpc %B0, %1 \n\t”
“brne .-10 \n\t”
: “+r” (delay), “+a” (tmp)
: “0” (delay)
);
}

//SBIW:- [operands] Rd,K “Subtract Immediate from Word” Rd+1:Rd ← Rd+1:Rd - K
//LDI:- [operands] Rd,K “Load Immediate” Rd ← K
//CPI:- [operands] Rd,K “Compare with Immediate” Rd - K
//CPC:- [operands] Rd,Rr “Compare with Carry” Rd - Rr - C
//BRNE:- [operands] k “Branch if Not Equal” if (Z = 0) then PC ← PC + k + 1


(Bruce Hoult) #2

That’s AVR assembly language that could probably just as easily (and more portably) written in C as:

void tunedDelay(uint16_t delay){
  do {
    delay--;
  } while (delay != 0xFFFF);
}

There are a couple of problems with using such code in other systems:

  • the compiler could optimise the whole thing away, as no result is produced
  • it will run at different speeds on different CPU cores or at different clock speeds
  • it will run at inconsistent or at least hard to calculate speeds on all but the simplest CPUs due to things such as instruction caches and branch predictors

On the Hifive1 or any RISC-V) you are much better off using the timer register which enables you to measure delays accurate to about 0.03 us (30 ns). That AVR code looks like it’s about 0.5 us per iteration at 16 MHz. See Delay function in freedom sdk


(Daniel kirubakaran) #3

yes! and they had defined it like this,

#if F_CPU == 16000000

static const DELAY_TABLE PROGMEM table[] =
{
// baud rxcenter rxintra rxstop tx
{ 115200, 1, 17, 17, 12, },
{ 57600, 10, 37, 37, 33, },
{ 38400, 25, 57, 57, 54, },
{ 31250, 31, 70, 70, 68, },
{ 28800, 34, 77, 77, 74, },
{ 19200, 54, 117, 117, 114, },
{ 14400, 74, 156, 156, 153, },
{ 9600, 114, 236, 236, 233, },
{ 4800, 233, 474, 474, 471, },
{ 2400, 471, 950, 950, 947, },
{ 1200, 947, 1902, 1902, 1899, },
{ 300, 3804, 7617, 7617, 7614, },
};

const int XMIT_START_ADJUSTMENT = 5;

#elif F_CPU == 8000000
…it goes.

So, Can i use this one

without any doubt?


(Jim Wilson) #4

With gcc, I would suggest

void tunedDelay(uint16_t delay){
do {
delay–;
asm volatile ("");
} while (delay != 0xFFFF);
}

so that the loop doesn’t get optimized away at -O2 or -Os. But the timer delay functions Bruce pointed at are of course better ways to do this.


(Thomas Hornschuh) #5

The code snippet is from the Arduino Soft serial library, I assume.
I remember it, because a while ago I was working wirh this piece of code.
The whole Softserial is very dependant on AVR features, not only the timing, also regarding the usage of the AVR gpio. I think I remember that it is also using an edge interrupt. Conceptually it could be ported. It uses the tunedDelay to time short delays of 1,5 - 1 bittimes. The values in the table are the delays for e.g. waiting after the start bit edge, from one bit to the next, etc.
The mtime register may no be suitable, because the RTC clock in the HiFive1 is only 32Khz. So it may be much better to use the mcycle csr, this will give dependable timeings, independent of caches and branch predictors, at leat as long the core runs with a fixed clock.

Because I was already considering porting the soft serial to the HiFive1 I’m willing to help.


(Liviu Ionescu) #6

Please address this complain to the ISA architects.

I already did this and received no answer.

The microcontroller profile proposal also addressed this, see The system clock… entry.


(Thomas Hornschuh) #7

Why you think this is an ISA Issue? First of all, timers are not part of the ISA, they are part of the privilege spec.
The ISA should be independent of a usage profile.

Second the privilege spec say nothing about the clock rate for the mtime counter, it only says that it runs on a fixed clock. It is a decision of SiFive for the HiFive1 to use a 32768 Hz clock.

On my Bonfire CPU it runs with the cpu clock.

What is lacking with RISC-V is maybe additional „profiles“ that specify such things for a specific use case.


(Daniel kirubakaran) #8

Yes Mr.Thomas you are right , actually i am trying to port it for hifive1 as my sensor requires uart communication.While reading the concept with the CPP code i got stuck in this asm code.So i raised this question.If you can ,i really appreciate your help.


(Liviu Ionescu) #9

It should be, I advocated for this a lot, but without success. Legally the ISA specs include the privilege specs, see https://github.com/riscv/riscv-isa-manual/blob/master/src/intro.tex#L60-L67.


(Bruce Hoult) #10

You could, but you’ll need to make some changes.

First, you should add Jim’s empty asm() to prevent the compiler optimising away the useless loop.

Second, the HiFive1 executes the code in fewer clock cycles (two clock cycles per loop), plus may be running at a much higher clock speed (e.g. 256 MHz). So you’ll have to calculate new values for all the constants. The new constants for the slower baud rates will be too big for a 16 bit integer if you’re running the HIFive1 at more than 16 MHz, so you should change the uint16_t to uint32_t and the stop value to 0xFFFFFFFF.

I tried…

__attribute__ ((noinline))
void tunedDelay(uint32_t delay){
  do {
    delay--;
    asm volatile("");
  } while (delay != 0xFFFFFFFF);
}

… and got the RISC-V code …

2040013c <tunedDelay>:
2040013c:       57fd                    li      a5,-1
2040013e:       157d                    addi    a0,a0,-1
20400140:       fef51fe3                bne     a0,a5,2040013e <tunedDelay+0x2>
20400144:       8082                    ret

(Bruce Hoult) #11

Any given implementation of the ISA will conform to some privileged ISA spec, but there can be many different privileged ISA specs. There is only one user ISA spec.


(Bruce Hoult) #12

If you want to do delays of less than about ten seconds at 256 or 320 MHz (or four minutes at 16 MHz) then I’d suggest the following function:

#include <encoding.h>

__attribute__ ((noinline))
void delayClockCycles(unsigned long delay){
    unsigned long start_mcycle = read_csr(mcycle);
    do { } while ((read_csr(mcycle) - start_mcycle) < delay);
}    

Looks like about 9 clock cycles of overhead if you call that with a 0 argument, so about 35 ns at 256 MHz.