Best way to do multi-precision integer compares on RISC-V?

I’m writing my own 64-bit integer math routines for RV32. Call me crazy. Actually, it’s for the RISC-V target of my Forth cross-compiler project, muforth.

Having never written assembler code for an architecture that lacks condition codes, I was at first at a loss about how to approach the problem. After some thinking, I realized that I could use the SLTU instruction to calculate the carry (for add) and borrow (for subtract). I tested that code, and it seems to work correctly.

But doing 64-bit signed and unsigned compares - especially doing it without any branch instructions - mystified me even more. I initially tried to modify my 64-bit subtraction code, but I kept getting wrong answers.

After yet more thinking, I realized a way to get the correct answer, but it requires the use of a conditional branch.

Here is my method:

  • Compare the high-order words. If they are equal, return as the result the unsigned comparison of the low-order words.

  • Otherwise, ignore the low-order words and:

    • For unsigned values, return the unsigned comparison of the high-order words
    • For signed values, return the signed comparison of the high-order words

I tested this code and it seems to work, but it requires a BEQ or BNE instruction and two separate code paths. What I am wondering is, am I missing something? Is there a better way?

Sorry about the re-invention of the wheel here. :wink:

With no condition codes (and with SLTU etc), it means if you can write it in C then you can write it exactly the same in assembler :slight_smile:

struct intdp {
    unsigned int h, l;

bool lt(intdp a, intdp b){
    bool h_lt = a.h < b.h;
    bool h_eq = a.h == b.h;
    bool l_lt = a.l < b.l;
    return h_lt | h_eq & l_lt;

   2: 00d5b6b3  sltu a3,a1,a3
   6: 40c505b3  sub  a1,a0,a2
   a: 0015b593  seqz a1,a1
   e: 8eed      and  a3,a3,a1
  10: 00c53533  sltu a0,a0,a2
  14: 8d55      or   a0,a0,a3

Not very short, but what can you do?

x86_64 doesn’t do any better on this algorithm, by the way – one instruction and two bytes longer:

 0:	48 39 ce             	cmp    %rcx,%rsi
 3:	0f 92 c1             	setb   %cl
 6:	48 39 d7             	cmp    %rdx,%rdi
 9:	0f 94 c0             	sete   %al
 c:	21 c8                	and    %ecx,%eax
 e:	48 39 d7             	cmp    %rdx,%rdi
11:	0f 92 c2             	setb   %dl
14:	09 d0                	or     %edx,%eax

Using gcc’s built in __int128 with the “<” operator on x86 also gives one instruction more than the branchless RISC-V code … including three conditional branches! Of course if you get lucky and the high words differ then you only execute three or four of them. And it’s four bytes shorter than the branchless RISC-V code.

22:	48 39 ce             	cmp    %rcx,%rsi
25:	b0 01                	mov    $0x1,%al
27:	7c 09                	jl     32 <_Z3iltnn+0x10>
29:	7f 05                	jg     30 <_Z3iltnn+0xe>
2b:	48 39 d7             	cmp    %rdx,%rdi
2e:	72 02                	jb     32 <_Z3iltnn+0x10>
30:	31 c0                	xor    %eax,%eax