Immediately, 64 bits

Before I start, let me say that this is incredibly poorly documented. I’ve spent more than a few hours trying to answer this question. Not exactly a board issue — but, risc-v-wise, for-better-or-worse, y’all are my peeps.

So… with RV32I, we have “li” … which breaks down into several instructions … mostly load-shift-load-shift-load … if you want to load a 32 bit value. You only have 12 bits. I get it. Note that the shifts could also include other math to construct a 32 bit value.

Anyways… I also note that we have lui and auipc. Those make immanent sense to me. Sure. With 20 bits there (loaded into the upper part of the destination register), you can then li the lower 12. Complication? Sign extension? Nothing makes this crystal clear, but I’m sure it must work out — would be rather embarrassing otherwise.

OK. Now for RV64I, lui and auipc still load bits 13 to 32 … and then sign extend 64. This seems to mean that loading a 64 bit value is a lot more complex. Two instructions could load 32 bits, … and then you could shift, but without a 2nd register, you can’t use lui or auipc again — they sign extend. If you used two registers, you could lui, li; shift (first register) lui, li (2nd register) and then AND them… but that’s getting close to load-shift-load-shift-load-shift-load-shift-load in complexity.

Am I even close to what’s happening? Is there any reason this isn’t spelled out in detail in any of the dozens of instruction set overviews and risc-v courses I’ve skimmed through looking for this information? I mean… my main sources for this information are actually compiler explorer ???

Anyways… I’d love some comments, pointers or example code. I don’t need this right now — but I like risc-v and this issue has been keeping me up at night.

I’m not going to answer your question directly. I will give you some background on why this is confusing, and ultimately redirect you to the RISC-V Assembly Programmer’s Manual, where your question really belongs.

First, please acknowledge that the RISC-V ISA does not define li. If you search riscv-spec.pdf, you’ll notice that the only hits are in assembly-code examples in one of the appendices. (Arguably a defect.)

li is a pseudoinstruction defined by the assembly language. Here is its “formal” definition. The issue here is that the RISC-V assembly language — a formal language — really hasn’t been properly (formally) specified. Compare that “manual” with the specifications of other programming languages. I think you’ll agree it’s lacking. (You’re welcome to help. I suggest you first poke around through the open PRs and Issues, to understand the type of people you’re dealing with.)

The reality is that the RISC-V assembly language was developed in an ad-hoc manner by the developers of the GNU C compiler and GNU assembler, in the early days of RISC-V, and they unilaterally introduced things like li because they made their lives easier. The existence of such pseudoinstructions is rather contentious in the community, especially among Programming Language purists, who dislike that, e.g., li expands into “myriad sequences”, instead of a clean 1-to-1 mapping from an assembly-language statement to machine-code instruction. And the purists are especially bitter that these unclean language features are effectively grandfathered in.

In summary, the ISA doesn’t tell you how to create a constant in an X-register. It gives several low-level building blocks that you can use to do it yourself, in many different ways. For example, you could simply create a 64-bit constant in the .data section and load it with ld, no arithmetic at all (assuming you have its address…). The assembly language — or at least the GNU and LLVM implementations of it — provides a pseudoinstruction li that invokes undocumented assembler black magic to construct the constant. If you want to understand that magic yourself, you’ll need to follow up with questions in places like riscv-asm-manual, where the toolchain wizards hang out. (I suspect nobody remembers the magic, and someone will have to poke through the GNU assembler source tree to figure it out.)

Man… is it me, or am I feeling hostility in a reply on this board again? Geez.

So… I will try to diffuse… if I even can. Yes… I even describe “li” as something that breaks down to several instructions. I didn’t call it a pseudo-instruction directly, but I’m aware of the term and that “li” is one. I don’t particularly care about it’s lineage or it’s providence. I’m not even here about it’s controversy. I probably will not join the other board … because my frustration is somewhat fleeting.

I am aware that loading a register can take many forms. Mostly (as I did mention looking at compiler explorer) I’m interested in what the compiler emits. In general (as you might also know if you read here regularly, I’m a FreeBSD person) I pay attention to what clang emits. Less so with GCC.

I suppose I was mostly interested in how lui and auipc fit with RV64I/G, but I’ll return you to your regular low traffic life… I have read the RISC-V Assembly Programmer’s Manual … and I don’t find it does a good job at introducing the idioms expected of RISC-V — which is what my question was fishing for. In fact, I haven’t found any reference so far that deals adequately with idioms of immediate use, really.

For some context, why am I here, in this forum? I care deeply about owning computers that I truly control. This does require that I understand the operating system at a deep level. It requires that I understand my hardware at a deep level. The unmatched doesn’t meet my ideal, but it’s the closest anyone has come to my ideal and it has the possibility of improving in the direction of my ideal. I have posted about this several times, and it has also met with hostility rather than answers, so I only mention it again here to fend off any religious wars regarding pseudo-instructions. I don’t have a horse in that race.

Hi,

I reread my response and don’t see the hostility. I certainly don’t mean any. Sorry if it came across that way.

Perhaps you are reacting to me redirecting your question away from the SiFive forums. I think we agree that your question doesn’t concern the HiFive Unmatched dev-board per se; it is a general conceptual question about RISC-V. SiFive’s policy is to redirect general RISC-V questions to the RISC-V forums, in order to keep these forums focused on issues specific to our dev-boards. If you have feedback on this policy, I can share it internally with the people who decide these things.

As far as I know, there is no good guide on RISC-V idioms for constructing immediates via lui, auipc, etc. I think this would be a great addition to the Assembly Programmer’s Manual. (I wish RISC-V Int’l would support that project better.) Personally, I use li and don’t think too hard about it, unless benchmarking reveals the generated code sequence to be a performance bottleneck, in which case I’ll start picking at it.

Sorry that I don’t have anything helpful.

Best,
Nick Knight
Algorithms and Libraries Team

I suppose I admitted it wasn’t 100% on-topic when I posted — but my point was I didn’t want another rabbit hole. Heh. Sigh. I think it was the “first please acknowledge” that I found hostile. With 2 years being a hermit, it’s hard to read people’s interactions :).

I am glad that someone else is having similar problems. Maybe time is ripe for a really good intro course that covers some of these esrtwhile major differences…

The ISA provides a way to load a 32-bit constant and this is documented in multiple places.

The ISA does not provide a way to load a 64-bit constant. This is an optimization issue. Optimizations are generally not going to be in basic documents, and not standardized, because there is always a chance someone might think of a better optimization, and you don’t want to prevent them from using it;.

GNU as will accept li with a 64-bit constant, but it is a mistake to use it. That will likely give you the worst possible code.

Generally, putting the constant in memory and loading it is a good approach, and likely to be faster than trying to generate with lui/addi/slli/etc unless the constant has a long string of 0s or 1s that make it easy to generate.

GCC tries a dozen different strategies for generating a constant, and then chooses the cheapest one. The exact algorithm is subject to change at any time as new extensions are added, or we think of new ways to construct constants. If we don’t find a cheap enough one, we can just load it from memory. You can look at the gcc code if you want but this is probably not very interesting.
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv.c;h=a545dbf66f734855090568ce1a253b373345eec0;hb=HEAD#l395

The ISA does provide encodings for 48-bit, 64-bit, 96-bit, etc instructions. If you want a better way to load a 64-bit constant, you can always add a wider instruction with more immediate bits. You could add a 64-bit instruction with 52 bits of immediate for instance which then lets you load a 64-bit constant with that instruction plus addi.

The latest official spec (20191213) has a chapter which is described as “a placeholder for an assembly programmer’s manual”. The spec has two appendices, this is not one of them. There’s no disclaimer that clarifies what impact it being a placeholder has. Clearly, li is part of the spec while also being poorly documented.

The RISC-V Reader describes li as “Loads a constant into x[rd], using as few instructions as possible. For RV32I, it expands to lui and/or addi; for RV64I, it’s as long as lui, addi, slli, addi, slli, addi, slli, addi.” The authors of the book and spec overlap.

It seems like li was intentionally added and it’s description in the Reader claims that it is supported equally on RV32I and RV64I, which the spec also implies.

While we’re discussing the history of the literature, I’ll point out that this placeholder was removed 18 months ago in commit a40f3 (see also PR #540). The latest can be found here. Perhaps the older version you cited still counts as “latest official” — I don’t know who decides that — but in any event I’d expect this placeholder to disappear from the next official release.

I agree that "li was intentionally added and […] is supported equally on RV32I and RV64I." My earlier post was just clarifying this statement, that li is part of the assembly language, not the instruction set architecture. For me, this resolves the question of why the semantics of li is so vague, compared to (actual) instructions like lui. (“As few instructions as possible” …? I don’t think any assembler, including GNU, achieves that.) But I acknowledge that this completely misses the point of the original question. I think Jim understood the original question, and provided useful information.

Thanks for the clarification. I’m glad they removed it from the spec, since it should make it less confusing.

That repo’s README says: Official versions of the specifications are available at Specifications - RISC-V International

Thanks guys. My main issue was that the lui->addi->slli->addi (and so on) was the best I could come up with. I feel somewhat sheepish that my mind didn’t come up with ld rd, (datapointer) … which seems like something obvious the assembler could even come up with — if the assembler knew your datapointer convention. From a best practices viewpoint, my first extra concern is that data is often rw while code is ro — so there are security implications to the practices of the toolkit — but that doesn’t detract from the solution existing.

My mind just wandered to an even more RISC situation where LD commands weren’t provided and you had simply a shift-into-register … either a 1 or a 0. My mind spins on this for no reason.

… but my point remains — I’ve read the spec. I have a copy of it between my comfy evening chair and my bed. Without some explicit discussion of this point, I was left without a satisfying solution to my inquiry about the RV32G and RV64G. It seems others may have been similarly struck. Documentation is not about simply specifying everything you have to say, it’s also about meeting your audience in/with that dialogue.