The Big RISCY BUSINESS Question Thread

I’ve decided rather than making a topic here every time I have a question
on my show, I’ll just have one big topic where I post all my questions; so here it is!

On episode 31, I’ve become stumped on how the PWM signals are driving the LEDs in the
LED fade demo included with the freedom-e SDK.

In led_fade.c we see the following code:

  uint16_t r=0xFF;
  uint16_t g=0;
  uint16_t b=0;
  char c = 0;
  // Set up RGB PWM
  PWM1_REG(PWM_CFG)   = 0;
  // To balance the power consumption, make one left, one right, and one center aligned.
  // Period is approximately 244 Hz
  // the LEDs are intentionally left somewhat dim, 
  // as the full brightness can be painful to look at.
  PWM1_REG(PWM_CMP0)  = 0;


    volatile uint64_t *  now = (volatile uint64_t*)(CLINT_CTRL_ADDR + CLINT_MTIME);
    volatile uint64_t then = *now + 100;
    while (*now < then) { }
    if(r > 0 && b == 0){
    if(g > 0 && r == 0){
    if(b > 0 && g == 0){
    uint32_t G = g;
    uint32_t R = r;
    uint32_t B = b;
    PWM1_REG(PWM_CMP1)  = G << 4;            // PWM is low on the left, GPIO is low on the left side, LED is ON on the left.
    PWM1_REG(PWM_CMP2)  = (B << 1) << 4;     // PWM is high on the middle, GPIO is low in the middle, LED is ON in the middle.
    PWM1_REG(PWM_CMP3)  = 0xFFFF - (R << 4); // PWM is low on the left, GPIO is low on the right, LED is on on the right.

From what I understand, we aren’t scaling so pwms will be the low 16-bits of the
pwm counter. We are starting our colour at r = 0xFF, so the logic is going to cycle
us between these rgb values 0xFF0000, 0x00FF00, 0x0000FF…

Looking at the led as it fades from colour to colour, that is exactly what the program
seems to be doing.

My confusion comes from the last 3 lines where we set the values of PWM_CMP1-3.

We see that we are producing a right-aligned PWM waveform for G, center aligned for B,
and left-aligned for R. By doing 0xFFFF - R, it looks like with the subtraction we
are inverting it back to behave like a right-aligned waveform? I tried graphing
out a right aligned waveform for G where G is 0xFF. I put 2^16 units on the x-axis
as the counter should reset at that point since that is when pwms would overflow
and we have PWM_CMP0 at 0. So we start by doing 0xFF << 4 which would give us
0xFF0. This would give us a signal that is off for ~6% of the PWM cycle and on
for the rest as it is right-aligned. As the value in G decreases, the time we are
pulling the line low would decrease, making the percent of the cycle we are off for
smaller until it is on 100%. Wouldn’t that mean we start with a pretty bright green light
and let it grow brighter? That is backwards from what I would logically expect, and what
we observe when we see the program run. I also don’t understand (B << 1) << 4, isn’t
that just B << 5?

Logically I would expect a left aligned waveform for R, G, and B where duty cycle
corresponds directly with the brightness of that LED, so I am totally lost regarding
how this code is working.

The comments on those last three lines is trying to explain what’s going on:

The PWM always creates waveforms that start low and go high (and may go low again). It never creates waveforms that start high and go low.

So how can we get a right-aligned waveform? We need to invert the output! That is what this line is for:


By XORing the GPIO for the RED LED, its output is inverted.

If we just did that and inverted the waveform, then the LED would be on more than we wanted and would be way too bright. So, that is why we do:

0xFFFF  - (R << 4)
1 Like

I see, based on the e300 manual I had the impression that the PWM hardware generated right-aligned waveforms, and that by inverting it we would get a left-aligned. We tested it out on episode 32 and saw that you are correct - it produces left-aligned waveforms and inverting gives right-aligned waveforms.

I still don’t understand why we do PWM1_REG(PWM_CMP2) = (B << 1) << 4;

That is the centre-aligned waveform, I see that the << 1 must be compensating for the fact that it is centre-aligned just like how 0xFFFF - n compensates for a right-aligned waveform, but I haven’t gotten the intuition of how this is working for the centre-aligned one yet.

edit: In the last episode we figured out that it was probably supposed to be PWM1_REG(PWM_CMP2) = (B >> 1) << 4; because a centre aligned waveform has 2x the duty cycle compared to a left aligned waveform, so we divide our input by two (shifting right by 1) to get the same brightness as our other channels.

Currently we are trying to find the source code within gcc for the __atomic_fetch_xor builtin, as atomic_fetch_xor_explicit in demo_gpio seems to be a macro that calls that builtin

Perhaps… but there was also a fair amount of “that looks about right” :slight_smile: If you try omitting the << 1, does the blue come out too bright? I believe I found the blue LED a bit overpowering which is why the code toned it down.

Where can we find where it is defined how uninitialized variables with static storage duration are initialized for the hifive1? Section 5.1.2 of N1570 says:

All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified.

(I assume this is unchanged from C89)

On Linux, I know it is defined that the uninitialized ones go in the bss segment which get zero initialized by the program loader, but I don’t know how this works for the hifive1 since it is bare metal, where would I look to find out how it works?

It’s the same. Run objdump -d on one of your programs.

_start sets up the gp register then does some arithmetic with _edata and _end and then calls memset.

00010074 <_start>:
   10074:       0000a197                auipc   gp,0xa
   10078:       04418193                addi    gp,gp,68 # 1a0b8 <_gp>
   1007c:       0000a517                auipc   a0,0xa
   10080:       8ac50513                addi    a0,a0,-1876 # 19928 <_edata>
   10084:       0000a617                auipc   a2,0xa
   10088:       91060613                addi    a2,a2,-1776 # 19994 <_end>
   1008c:       8e09                    sub     a2,a2,a0
   1008e:       4581                    li      a1,0
   10090:       2c19                    jal     102a6 <memset>
   10092:       00000517                auipc   a0,0x0
   10096:       17250513                addi    a0,a0,370 # 10204 <__libc_fini_array>
   1009a:       2a15                    jal     101ce <atexit>
   1009c:       2a79                    jal     1023a <__libc_init_array>
   1009e:       4502                    lw      a0,0(sp)
   100a0:       004c                    addi    a1,sp,4
   100a2:       4601                    li      a2,0
   100a4:       28dd                    jal     1019a <main>
   100a6:       a281                    j       101e6 <exit>
1 Like

Some more additional info.
bsp/env/start.S is the source code of the above routine. (should start at 0x20400000)
bsp/env/freedom-e300-hifive1/ is a map file which specifies the address where program and data are placed.

1 Like

Is there any reason we couldn’t just do
delta_mcycle * mtime_freq / delta_mtime
instead of
(delta_mcycle / delta_mtime) * mtime_freq + ((delta_mcycle % delta_mtime) * mtime_freq) / delta_mtime
in init.c’s measure_cpu_freq? We worked it out on the series that the second part of the equation is correcting for the loss from the integer division, and it seems that you would get the same answer without the correction term if you just rearrange it so that the multiplication happens before the division.

Also why is measure_cpu_freq attribute((no_inline)), is there a reason we don’t want the compiler to inline this?

What are the largest values values delta_mcycle and delta_mtime might have? What will you get when you multiply them? What is the biggest number you can store in an int?

1 Like

The reason measure_cpu_freq is no_inline is to make sure its code is in cache. In get_cpu_freq that function is called once to warm the cache, and then it’s called with n=10 to run for 10 timer ticks for real. If it wasn’t no_inline and it happened to be inlined, the second call would jump to a different code address, which would probably not be in cache. When a cache miss happened the timings would be perturbed, affecting the CPU clock frequency calculation. The reason a cache miss would affect the timings is that (at the very least) the timing would now also reflect the latency of loading the code from flash storage, rather than just the time it takes the CPU to compute the instructions.


So we’ve been studying the bootloader lately and I’m confused about LED related stuff again…

When the double tap don’t boot path is taken, before it enters the infinite loop it sets up the pwm compare registers like this:


My understanding of PWM_CMP0 was that it allows us to control the period when the pwmzerocmp bit is set, as that will cause the counter to reset after it passes the value in PWM_CMP0. However, we aren’t setting the pwmzerocmp bit, and it isn’t clear to me what PWM_CMP0 does in that case. I also don’t understand why PWM_CMP3 is being set to 0xFF before the infinite loop as it will just get overwritten by the value from the fading code.

The other question I have is about the code for the green LED blink, it looks like this:


My understanding is that we are turning on the green LED through SW rather than using the PWM hardware. Conceptually I would expect us to send a 1 over the pin until we decide to turn it off, so I figured we would just do this:


Doesn’t turning on the XOR for that bit cause it to invert the value we are sending on the port (and thus over the pin)? That would mean we are turning on the green LED by sending 0 over the gpio pin?

I don’t know about you, but I find programming the PWM-driven LEDs more an art than a science… try some things, see how it looks to the human eye, tweak, and repeat. For the first question, the PWM1_REG(PWM_CMP0)... lines are just leftovers from some such trials and should be cleaned up (but we also want to include the source that actually comes on the board).

For driving the Green LED blink, I believe your code snippet is from here:

Yes, the LEDs are active low, so we need to drive a 0 to turn the LED ON. We use the VAL=1 and XOR=1 because it’s a bit more clear in the intent than


which would also work.

1 Like

There is a science to accurate colour reproduction, but it would be total overkill for things like blinking the LED in the bootloader, or fading the LED for a demo, so your method of just << 4 makes a lot more sense for what is trying to be accomplished. I think accurate colour reproduction would involve things such as calibrating against the reference white D65, do things like colour space conversion, gamma correction, and even then I think the characteristics of LEDs change over time as they age, maybe in extreme environmental conditions, etc. My understanding is that a full blown LED driver tries to accurately reproduce colours with those sorts of considerations, but like I say - that is total overkill for a blinkin’ light.

This is what I was fundamentally confused about, I was expecting the opposite. I’ll have to re-review the LED fade demo with this in mind!

One of my viewers described it to me as follows:

[01:12] croepha: Alright, go to the board schematics, page three, grid D1
[01:13] croepha: There are three LEDs, and the are being provided 3.3v from the mains
[01:14] croepha: and the SOC is providing a switched ground on GPIO 19-22
[01:14] croepha: it needs to be inverted, because low will turn the LED on
[01:14] croepha: if its high, then no current will flow though the LED
[01:15] croepha: because the potential will be the same on both sides of the LED

Thank you for the help/insight :slight_smile:

1 Like

We’re studying entry.S on the series and I have two questions about the following snippet of code:

  csrr a0, mcause
  csrr a1, mepc
  mv a2, sp
  call handle_trap
  csrw mepc, a0

Why do we mv a2, sp? I don’t understand the purpose of passing the stack pointer as a 3rd argument to the trap handler, and the handler in the freedom-e-sdk doesn’t use it. My other question is why do we do csrw mepc, a0. I understand that a0 is the return value of the trap handler, but I don’t understand why we’d write it back to mepc. Isn’t mepc just to tell us what instruction caused the trap?

Why do we mv a2, sp? I don’t understand the purpose of passing the stack pointer as a 3rd argument to the trap handler, and the handler in the freedom-e-sdk doesn’t use it.

My other question is why do we do csrw mepc, a0. I understand that a0 is the return value of the trap handler, but I don’t understand why we’d write it back to mepc. Isn’t mepc just to tell us what instruction caused the trap?

You are correct in that the C code handler in entry.S doesn’t make use of the stack pointer nor does it change mepc. The trap entry/exit code in Freedom-E-SDK is comes from a more full featured trap handler where context switching may be involved. In the case of a context switch you would want to save off the sp for the context before changing sp to the new context’s stack and also return to a different pc then from where the trap was taken.

1 Like

Thanks for the help Drew :slight_smile:

Today we were looking into the GNU Assembler syntax specifics and one thing I’m wondering about is why there is .align 2 in the .text.entry section and what exactly it does. I understand that it specifies alignment for padding the location counter to a storage boundary but I don’t know what the location counter is (I’ve never looked into ELF/object file format details before) and basically I’m just wondering if anyone has helpful links to learn more about this stuff.

Also in the GAS manual it says that the size of the argument to .align is platform dependent, some use bytes, others words, and others use it to specify the number of low-order 0 bits the location counter must have. I’d like to know where I can find out what the RISC-V version interprets it as.

One other thing I’m not clear on is how the flow of the files through the toolchain pipeline totally works. I see that there is a header guard and C preprocessor directives in entry.S which makes me think this file must get passed to gcc and run through the preprocessor before it hands it off to GAS? I’m confused though in that the #include directives would expand to C code within the entry.S translation unit, and I would think if you just naively passed the result into GAS it wouldn’t like the C stuff, I’m curious what exactly GCC/GAS/whatever else is doing under the hood to process this file

The location counter (also known as “.”) is the memory address at which the next instruction assembled (or define byte/word etc) will be placed.

The relevant documentation is for gas … or for any other assembler in the last 40 years probably :wink:

RISC-V .align specifies the number of zero bits in the address after the alignment. So “.align 5” means something like “. EQU (.+31)&~31”

1 Like

Just to add to @brucehoult’s comment, some there are some things in the RISC-V specification which require a specific alignment and using the .align X assembler is how that alignment is enforced. For example mtvec must be aligned to a 4-byte boundary.

1 Like

Also, after a Define Byte or an ASCII string you’ll want at least a .align 1 before following code if C extension is enabled, or .align 2 if C is not supported. That’s required to make sure you don’t crash.

You might sometimes be able to increase the program speed by putting .align 5 before commonly called functions (on HiFive1, with a 32 byte cache line size).

1 Like

I intend to build a RISC-V PC with the HiFive Unleashed and HiFive Unleashed Expansion Board (I’ve got the HiFive Unleashed and have done an unboxing and played around with it on the series already)!

What graphics card/other hardware are you guys @ sifive using for your demos/what do you recommend? :slight_smile:

I’m also curious if the Freedom E SDK can run on a RISC-V host yet? It’d be nifty to program my HiFive1 from my HiFive Unleashed… :wink: