Tilelink fragment large bursts


#1

Could you please show an example how to increase the MAX burst size for TL transactions?

I’ve been playing with the TLFragmenter Node, but can’t connect it to the xbar due to different Port types.

I’d like to do something like that:

val Frag = TLFragmenter(16,256)
val (tl_out, tmpEdge) = outer.tlNode.out(0)
val edge = Frag := tmpEdge // <---- This won’t work

Which yields in error:

[error] [DX, UX, EX, BX <: Chisel.Data, EY](h: freechips.rocketchip.diplomacy.NodeHandle[DX,UX,EX,BX,freechips.rocketchip.tilelink.TLClientPortParameters,freechips.rocketchip.tilelink.TLManagerPortParameters,EY,freechips.rocketchip.tilelink.TLBundle])(implicit p: freechips.rocketchip.config.Parameters, implicit sourceInfo: chisel3.internal.sourceinfo.SourceInfo)freechips.rocketchip.diplomacy.NodeHandle[DX,UX,EX,BX,freechips.rocketchip.tilelink.TLClientPortParameters,freechips.rocketchip.tilelink.TLManagerPortParameters,freechips.rocketchip.tilelink.TLEdgeOut,freechips.rocketchip.tilelink.TLBundle]


(Wesley W. Terpstra) #2

The middle line looks like you defining the implementation of a LazyModuleImp. That’s fine, but you can’t connect diplomatic nodes there. Also, from a stylistic point of view, we don’t generally embed the Fragmenter into the device. It might be that someone wants to use the slave without burst support, so the adapter should not be rolled directly into the implementation.

Take a look in rocket-chip/src/main/scala/devices/tilelink/CLINT.scala.

You’ll see there is a class CLINT which implements a TileLink device using a TLRegisterNode. Then at the bottom is a trait that captures a code snippet that creates the CLINT and a TLFragmenter and attaches them to the cbus.

Generally, you want something like: (myslave.node := TLFragmenter(beatBytes, maxBurstBytes) := xbar.node) when you attach the slave.


#3

Hello Wesley

Thanks for a prompt response. I’m trying to load/store a non-cached 8kB page to my ROCC module. I was able to connect a TLFragmenter with the code below:

// LazyROCC

val source_min = 0x10
val outstanding = 0x20
val source_max = source_min + outstanding

val tmpNode = TLClientNode(Seq(TLClientPortParameters(Seq(TLClientParameters(
sourceId = IdRange (source_min, source_max),
name = s"MyCore0")))))

val frag = TLFragmenter (16,256)
override val atlNode = frag := tmpNode

However, compilation gives a size warning and fails on require fail:

[1, 2] => dut
WARNING: TLMonitor instantiated on a bus with source bits (16) > 14; A=>D transaction flight will not be checked

[error] Caused by: java.lang.IllegalArgumentException: requirement failed
[error] at scala.Predef$.require(Predef.scala:264)
[error] at freechips.rocketchip.tilelink.TLFragmenter$$anon$2.$anonfun$new$4(Fragmenter.scala:73)

Any ideas? Thank you!


(Wesley W. Terpstra) #4

The warning can be ignored. The error says that the devices below the TLFragmenter can respond out-of-order, something that it can’t support. Generally, we put the Fragmenter in front of the slaves, where this is not a problem. If you want to but it in front of a master, try altNode = TLFIFOFixer() := TLFragmenter(64, 4096) := tmpNode.

I believe the TileLink spec caps the maximum transfer size at 4kB, so I expect you are going to run into problems with an 8kB burst.


(Wesley W. Terpstra) #5

Also, by putting a Fragmenter at the master as you’ve done, the burst is not atomic. It will appear to slaves to be multiple distinct transfers. Furthermore, by using a FIFOFixer like this, performance to any out-of-order memory (which includes main memory) is going to be very slow, with only one request in-flight at a time.

If you don’t care about atomicity, but DO care about performance, probably a better design is to have the master capable of issuing and tracking multiple outstanding requests at once.


#6

Hi Wesley

  1. The only reason for me to attach the Fragmenter to the Master was to support larger burst transfers, ideally 4kB and 8kB.

val get = edge.Get(rsource, raddr, a_size)._2

Where a_size > 6, which defaults to only 6 (2^6 = 64B = default cache line size). I don’t use cache, so wanna use TL-UH with large bursts.

  1. TL Spec doesn’t prohibit large burst transfers, which is nice. However, it recommends page aligned transfers with a 4kB page. I’ll handle this in SW.

  2. Adding TLFIFOFIXER() for the Master node didn’t help. Compilation failed with the following:

[error] Caused by: java.lang.IllegalArgumentException: requirement failed
[error] at scala.Predef$.require(Predef.scala:264)
[error] at freechips.rocketchip.tilelink.TLFragmenter$$anon$2.$anonfun$new$4(Fragmenter.scala:74)

Which is
require (!manager.anySupportAcquireB)

  1. After poking over the code, the best place I found to insert the Fragmenter was the MemoryBus, since my slave is AXI4 DDR. I did the following:

// MemoryBus

TLNameNode(name)): OutwardNodeHandle[D,U,E,B] = {
to(“memory_controller” named name) { gen := TLBuffer(buffer) := TLFragmenter(8,256, holdFirstDeny = true) := outwardNode }
}

This compiled, but I got an “old” error for a 256B burst from the TL Manager:

Assertion failed: ‘A’ channel carries Get type unsupported by manager (connected at CrossingHelper.scala:30:80)
at Monitor.scala:72 assert (edge.manager.supportsGetSafe(edge.address(bundle), bundle.size), “‘A’ channel carries Get type unsupported by manager” + extra)

The error above was my starting point to seek how to increase the burst size for software.

Can I do better than this ? I do have a solution, but this requires to redesign top levels, which I’m willing not to do…