Testing Unmatched at 1.4 GHz

I’ve bumped the CPU clock rate on my Unmatched to 1.4 GHz. It seems fine so far. As a test, I built the clang 12.0.0 compiler, which takes 13.5 hours at 1.4 GHz (with clang-tools-extra, libcxx and libcxxabi).

Unfortunately, the build doesn’t work. I get this error, which is usually taken care of on x86 by installing gcc-multilib. But there is no gcc-multilib on Ubuntu riscv64. This also prevents building gcc.

ubuntu@riscv64:~/xfer$ clang -O2 xport.c
In file included from xport.c:5:
/usr/include/stdio.h:27:10: fatal error: 'bits/libc-header-start.h' file not found
#include <bits/libc-header-start.h>
         ^~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
ubuntu@riscv64:~/xfer$

If anyone knows what’s going on with that, please chime in. Also, if anyone has some good stress tests to run, I’ll be happy to give them a try. I’ve tried stress-ng, but I’m not sure what’s the best test.

And just for fun, here’s a video of the Unmatched running GNU Radio. This flow is a DVB-S2 transmitter sending digital television over the air with a transmit capable SDR (connected with USB 3.0). Sorry, no YouTube (or Google) account.

https://www.w6rz.net/unmatched.mp4

4 Likes

Not a stress test, but just to add an (easily predicted) data point, could you run https://hoult.org/primes.txt ?

Just with gcc -O1, please.

ubuntu@riscv64:~/xfer$ ./primes
Starting run
3713160 primes found in 19054 ms
236 bytes of code in countPrimes()
ubuntu@riscv64:~/xfer$

Cheers. I’ve updated the file with the result.

root@unmatched:~/bench# ./primes
Starting run
3713160 primes found in 21953 ms
236 bytes of code in countPrimes()

for plain FreedomUSDK 2021.05.00 at 1.2 GHz.

21953/19054.
1.152
1.4/1.2
1.167

Almost a percent discrepancy. You get about that much variation just running it multiple times.

Beats Odroid C2 (A53) at higher MHz by a little bit, which is good. And Pi 3 by a lot.

1 Like

Are you building GCC from the Ubuntu source package or from the upstream FSF GCC sources?

I’m not running Ubuntu on my unmatched, but in general, building GCC on an ubuntu system requires --enable-multiarch because debian/ubuntu puts header files and libraries in different places than other linux distros. So the bits/libc-header-start.h is not in /usr/include but rather in /usr/include/riscv64-something and gcc will only search that dir if you use --enable-multiarch and specify a --build/host/target that matches the OS dir name. There is an additional problem that --enable-multiarch may not work for RISC-V without Ubuntu patches that haven’t been posted upstream yet. I haven’t checked recently. There are hacks to allow x86 linux gcc builds to work without the --enable-multiarch, but not for RISC-V and other ISAs. So for the moment, I think it is expected that FSF GCC sources won’t build, but the Ubuntu gcc source package will, as it has patches to make the build work the Debian/Ubuntu way.

I don’t know about clang, but it may be a similar issue.

You can probably get a better answer if you ask the debian/ubuntu folks instead of SiFive.

2 Likes

thats fantastic

may i ask btw, which graphics card you are using?
the ui acceleration looks very good.

No graphics card at all. That’s just an x86 host using ssh with X11 forwarding to access the Unmatched. In fact, there’s three hosts being used. The Unmatched (in the lower right hand terminal - ubuntu@riscv64) connected to a transmit capable SDR and antenna, another x86 host in another room in my house with a DVB-S2 receiver and antenna (lower left hand terminals - re@beavis) and the controlling host (upper right hand terminal running ffmpeg and recording the display and audio - re@w6rz). The remote x86 host is sending the received Transport Stream over UDP back to the control host which is playing it with VLC.

1 Like

i see, thanks for clarifying!

Testing at 1.5 GHz now. For @bruce

ubuntu@riscv64:~/xfer$ ./primes
Starting run
3713160 primes found in 17712 ms
236 bytes of code in countPrimes()
ubuntu@riscv64:~/xfer$

Nice.

That’s getting to be closer to the kind of speed we’d all been expecting, given that most FU540 chips run fine at 1.4 or 1.5 GHz and the U74 was originally announced as 20% higher clock speed than U54 on the same process.

Would be great if you could upload just the uboot partition (is that the right one?) with versions with settings for 1.4 and 1.5 somewhere. And a short note of which partition to dd it over :slight_smile:

I don’t think that we ever said that it would run at a 20% higher clock speed. We did say that the core would be ~20% faster at the same clock speed because of the dual issue pipeline.

Here are the files. While I was at it, I went up to 1.8 GHz (but I’ve only tested to 1.5 GHz, so try those higher clock rates at your peril). I’ve also dimmed the blue LED to something a little less blinding.

https://www.w6rz.net/u-boot-1200mhz.itb
https://www.w6rz.net/u-boot-spl-1200mhz.bin

https://www.w6rz.net/u-boot-1400mhz.itb
https://www.w6rz.net/u-boot-spl-1400mhz.bin

https://www.w6rz.net/u-boot-1500mhz.itb
https://www.w6rz.net/u-boot-spl-1500mhz.bin

UPDATE: From user feedback, it seems the Unmatched does not work above 1.5 GHz.

https://www.w6rz.net/u-boot-1600mhz.itb
https://www.w6rz.net/u-boot-spl-1600mhz.bin

https://www.w6rz.net/u-boot-1700mhz.itb
https://www.w6rz.net/u-boot-spl-1700mhz.bin

https://www.w6rz.net/u-boot-1800mhz.itb
https://www.w6rz.net/u-boot-spl-1800mhz.bin

sudo dd if=u-boot-1500mhz.itb of=/dev/mmcblk0p14 bs=4k oflag=direct
sudo dd if=u-boot-spl-1500mhz.bin of=/dev/mmcblk0p13 bs=4k oflag=direct

I’m using these stress-ng tests. This one is supposed to raise the temperature as much as possible.

stress-ng -v --matrix 0 -t 10m

This one is a memory test.

stress-ng --vm 1 --vm-bytes 4G --verify -v -t 10m

The -t 10m parameter runs the test for 10 minutes. You can increase that or even use -t 8h for an eight hour overnight test.

2 Likes

Superb! Thanks.

It’s hard to find at this point.

The U84 announcement is interesting:

U84, the company’s mainstream U8-Series variant, is claimed to offer a 3.1x improvement in performance compared to the last-generation U74. Of that uplift, 2.3x comes from an increase in instructions per cycle (IPC); a further 1.4x comes from a higher maximum operating frequency of 2.6GHz. Going a few generations back, to the U54, the difference is even more stark: built on the same 27nm process, the U84 is an impressive 5.3x faster; moving to the 7nm process ups that to 7.2x.

The U84 to U74 comparison is in slightly different terms to the U84 to U54 comparison, but I think the implication is that the 3.1x for U84 to U74 is at iso-process, as the 5.3x for U84 to U54 explicitly is. So that makes U74 to U54 5.3/3.1 = 1.7x iso-process.

1.7x seems a little high for simply dual-issue (20% seems very low!), so it seems there must be a higher frequency component iso-process in that as well.

The longer pipeline in U74 vs U54 would also tend to suggest that.

But who knows? :slight_smile:

In the Fall Linley Processor Conference talk about the Unmatched, Yunsup said it would be ~20% faster at the same clock speed than Unleashed. That is the only performance claim we made for the Unmatched that I am aware of.

This is a complicated subject. It is important not to confuse Core performance with SoC performance. SiFive isn’t a hardware company, and isn’t trying to make the best SoC. This SoC is a demonstration SoC for the Unmatched board and is not a commercial product. The primary goal was PCIe support not performance. The U7 core is an IP core that is improved quarterly, and the version of the core in the Unmatched SoC is actually older than the U7 core used for the U8 comparison you are referring to, and hence the core in the Unmatched SoC probably has less performance than the one used for that document.

The U7 core in the BeagleV Soc is a newer version of the U7 than the one in Unmatched. If they get the SoC right it should have better performance than the Unmatched.

2 Likes

That would be nice. Especially if they manage to get an M.2 slot onto it (i.e. PCIe).

this looks fantastic!

i finally go ubuntu working and its a real pleasure, but i noticed that compared to open embedded sdcard image, on the one from canonical for unmatched, my case fan doesnt turn on anymore, so im a bit trepidating about boosting clockspeed…

do you know if using those uboot images activate the case fan pins? i wonder what settings need to be adjusted for that… or is that a linux kernel thing… hmm