cyberwolf
(cyberwolf)
January 22, 2026, 2:27pm
1
Hello SiFive community,
I have started up the board and flashed the new Ubuntu image. Everything is fine so far. I have installed all .deb packages from ESWIN (ESSDK, FFMPEG) and now I would like to try out the NPU.
Question 1: What is the best way to do this? I can only find very few tutorials on this topic.
Question 2: ESWIN provides .deb packages such as es-sdk-sample-npu-qwen.deb, …-npu_1.10.deb. Is there detailed documentation on the .deb packages, what they do, and how to use them?
“dpkg -I” does not provide any information.
Question 3: Are there any examples or documentation from the manufacturer on how to start sample applications and measure performance?
Question 4: Where can I find documentation on NPU? TRM from ESWIN says I have to contact the manufacturer. Is the NPU a closed IP core, or why is the technical information being withheld?
Thank you for your attention.
Cyberwolf
ganboing
(Bo Gan)
January 28, 2026, 1:01am
2
I found these documents so far:
From ESWIN:
From DeepComputing:
# DC-ROMA RISC-V AI PC Install AI Models(Deepseek-7B) Guide
### **Note:**
- Adjust the MMZ size first in U-Boot:
- Entering U-Boot:
- Connect the AI PC via a serial port.
- During power-on startup, repeatedly press Enter to access U-Boot.
- Modify Environment Variables:
```
# Configure env to output only to serial port
env set stdout serial
# Permanently change MMZ size (11GB example)
setenv fdt_cmd "fdt mmz mmz_nid_0_part_0 0x1c0000000 0x2c0000000" #die0
setenv bootcmd "run fdt_cmd;${bootcmd}"
saveenv
#Reboot System
boot
This file has been truncated. show original
Beware that the DeepComputing one targets EIC7702, which is a dual-die version of the same chip. For P550/EIC7700, it’s only 1-die (chip), so you may have to slightly adjust the steps. Also you may need to use the ESWIN OS images rather than the Sifive one from Releases · eswincomputing/hifive-premier-p550-ubuntu · GitHub
Also check this out:
opened 01:56AM - 20 Jan 26 UTC
## Summary
The vendor image includes a working **dual-NPU “peer”** model format … + runtime path (e.g., `deepseek_7b_1k_int8_peer`) that loads across **both** NPUs and allocates from **both** MMZ pools. However, there is no public tooling or end-to-end documentation to *create* peer/composite models from upstream weights (or from single-NPU `.model` artifacts). Please publish the model conversion / compilation toolchain (or at minimum the peer/composite build step + format docs) so external users can generate peer builds for other LLMs and actually utilize both NPUs.
## Why this matters
- EIC7702 boards expose **two NPUs**; using only one is a major capability loss.
- Peer mode enables larger models (or larger context / headroom) by distributing artifacts across both NPU memory pools instead of forcing a single NPU carveout to be huge.
- This unblocks integration work with higher-level serving stacks (OpenAI-compatible endpoints, OpenWebUI/OpenWebLLM frontends, etc.) because folks can build the required artifacts themselves.
## Evidence that peer mode works (current vendor image)
### Peer model artifacts exist
- `/opt/eswin/sample-code/npu_sample/qwen_sample/models/deepseek_7b_1k_int8_peer/`
- Peer-specific shards exist (example names):
- `lm_npu_b1_d0.model` / `lm_npu_b1_d1.model`
- `modified_block_0_npu_b1_d0.model` / `modified_block_0_npu_b1_d1.model`
- `du -sh` footprint is similar to non-peer (expected for “split across dies” rather than “duplicate everything”).
### Runtime opens both NPUs during peer inference
- `sudo fuser -v /dev/npu0 /dev/npu1` shows the same PID has both devices open during peer inference.
### Both MMZ pools are actively allocated in peer inference
- `cat /proc/eswin/vb` reports both pools in-use with roughly symmetric free space remaining.
- Example observed values (per pool):
- Total: `0x17ffff000` ≈ **6.00 GiB**
- Free: `0x589e2000` ≈ **1.38 GiB**
- This implies ~**4.62 GiB** used on **each** pool (consistent with dual-NPU split).
(If you want exact logs/screenshots, I can attach them; the above is reproducible on the current vendor image.)
## Documentation indicates peer/composite is a supported concept (but the build path is missing)
The ENNP docs from the `https://github.com/eswincomputing/ebc7702-dev-board-ubuntu` repo explicitly describe dual-device / dual-die execution and “composite model” runtime support, and they also describe an offline toolchain. The missing piece is a publicly available, externally usable workflow for generating peer/composite LLM artifacts.
### 1) Dual-die inference and “composite model” runtime support is documented
From **ENNP User Manual v1.3**, §7 “Dual-Die Inference”:
- Describes inference on a “dual-die architecture” and notes pipeline inference can run in parallel on both dies, with guidance to minimize cross-die interaction.
- References runtime concepts/APIs such as:
- `ES_NPU_GetNumDevices(...)`
- `ES_NPU_SetDevice(...)`
- `ES_NPU_LoadCompositeModel(...)`
From **ENNP Developer Manual v0.9.4**, the NPU API is documented including:
- `ES_NPU_GetNumDevices` / `ES_NPU_SetDevice`
- `ES_NPU_LoadCompositeModel(...)` and the associated `NPU_COMPOSITE_MODEL_INFO_S` structure (suggesting multi-device model groupings are supported at the runtime/API level).
**So peer/composite across multiple devices is documented as a supported mode — but the public model-build pipeline for it is missing.**
### 2) An offline model toolchain is documented (but not publicly distributed in a usable way)
From **ENNP User Manual v1.3**, §3.3–3.4 and §6.5:
- Offline tools are named:
- **EsQuant** (quantization tool)
- **EsAAC** (model compiler)
- **EsGoldenDataGen**
- **EsSimulator**
- The docs also state the **EsNNTools** folder includes a Docker image containing the toolchain and show example usage of running EsAAC inside a container.
This strongly suggests the toolchain exists, but it is not currently published/linked in a way external users can obtain and use to build peer/composite LLM artifacts.
## Documentation discoverability / fragmentation (request)
The dual-die/composite references were not discoverable from the main “Framework” repo/docs for the DC-ROMA/Framework workstream, and required digging through a separate vendor dev-board repository and zip bundle to find. Please:
- Provide a **single canonical documentation location** for ENNP + model toolchain docs, and
- Link it clearly from the DC-DeepComputing “Framework” repo (and/or vendor image README), including peer/composite model guidance and the toolchain distribution.
## What I’m requesting (in order of usefulness)
### 1) Toolchain to build peer/composite models
Either source-available or binary release is fine, but it needs to be usable externally. Concretely, publish whatever is required from the documented tool suite (e.g., the EsNNTools Docker image(s) or equivalent packages) plus the missing peer/composite build steps.
Specifically:
- A documented way to convert an upstream model into ESWIN `.model` artifacts (docs suggest ONNX is a supported input; document accepted formats and constraints).
- The peer/composite build path that produces the multi-device shard set (examples):
- `*_d0.model` / `*_d1.model` (or equivalent die/device shard naming)
- rules for what is duplicated vs partitioned (e.g., embeddings)
- the correct multi-device config schema (for `es_qwen2` and/or direct ENNP runtime usage)
### 2) Documentation for peer/composite model format + partitioning rules
At minimum, publish:
- How a composite/peer model package is structured on disk (directory layout, required metadata, naming rules)
- The partitioning strategy (layer split policy; what must co-reside; what can be independent)
- Constraints/limits (supported families, quantization requirements, context limits, memory headroom expectations)
### 3) A reproducible reference example
Provide one small reference that takes a known model and produces a peer/composite build:
- Inputs: upstream weights (or non-peer `.model` outputs)
- Output: peer/composite shards + config + a command to run
- Verification steps: demonstrate both `/dev/npu0` and `/dev/npu1` are used and both MMZ pools allocate
### 4) Optional but very helpful: runtime knobs
If peer/composite mode can be enabled without regenerating artifacts (or with minimal metadata changes), document:
- environment variables / flags
- how device selection/binding works (NPU0 vs NPU1) and how allocations map to `mmz_nid_0_part_0` / `mmz_nid_1_part_0`
## Acceptance criteria (so we know it’s “done”)
- External user can take a supported upstream model (or non-peer `.model` output), follow documented steps, and produce a peer/composite model package.
- The resulting build:
- opens both `/dev/npu0` and `/dev/npu1` during inference
- allocates from both MMZ pools
- runs inference successfully (even if performance varies by model)
## Environment
- Board: DC-ROMA II / EIC7702 (FML13V03)
- OS image: Ubuntu 24.04 LTS v1.0.15019
- Kernel: `Linux roma 6.6.92-eic7x-2025.07 #2025.09.26.03.45+ SMP Fri Sep 26 03:53:01 UTC 2025 riscv64`
- Sample runtime: `/opt/eswin/sample-code/npu_sample/qwen_sample/bin/es_qwen2`
Thanks — publishing the peer/composite build toolchain (or even just the missing “split + package” stage plus format docs) would dramatically increase the value of the platform and enable moving beyond sample models.