Tensor Attributes
This page explains how to steer where tensors live in memory in the generated C inference module. Tensor attributes are provided via CLI directly or using YAML and applied during code generation.
Tensor IDs match the original model (stringified). See "How do I determine entity IDs?" in the docs for tips.
What you can control?
Right now tensor attributes focus on memory placement:
- memory: Choose a valid target memory for a tensor (e.g., DTCM, SRAM, PSRAM, MRAM).
Typical uses:
- Pin big constant weights to MRAM (or external flash/NVM) to save on SRAM/TCM.
- Keep scratch working sets in DTCM for speed.
- Place persistent model state in SRAM or PSRAM depending on size/speed needs.
- Some operators (e.g. FULLY_CONNECTED) are highly memory bound and benefit from having tensors in faster memory (SRAM/TCM).
Memory types
| MemoryType | Typical use | Notes |
|---|---|---|
| ITCM | Instructions (code) | Not commonly used for tensor data. |
| DTCM | Hot scratch/activation buffers | Fast, tightly-coupled; limited capacity. |
| SRAM | General-purpose data | Good default for persistent data if it fits. |
| PSRAM | Large, slower, off-chip working sets | Volatile; great for big tensors that don’t fit on-chip. |
| MRAM | Non-volatile constants (e.g., weights) | Ideal for immutable tensors as it's read-only. |
The backend/platform decides how a MemoryType maps to linker sections and where data actually ends up.
Rule structure & precedence
Tensor attributes are expressed with a generic “attribute ruleset”:
tensors:
- type: "*" # Tensor kind: SCRATCH | PERSISTENT | CONSTANT | "*" for all
id: null # Tensor id or list of ids, or null for all
attributes:
memory: DTCM # MemoryType
- type: PERSISTENT
attributes:
memory: SRAM
- id: ["conv2d_3_weight", "conv2d_3_bias"]
attributes:
memory: MRAM
- type: SCRATCH
id: "large_scratch_tensor"
attributes:
memory: PSRAM
Matching & precedence (from lowest to highest):
- (type="*", id=None) — catch-all
- (type=KIND, id=None)
- (type="*", id=ID or [IDs])
- (type=KIND, id=ID or [IDs]) — most specific
KIND must match the tensor’s kind in the model (e.g., SCRATCH, PERSISTENT, CONSTANT). Use "*" to match all.
Examples
1) Global defaults, with targeted overrides
```yaml
tensors:
# Default: everything to DTCM
- attributes: { memory: DTCM }
# Persistent tensors (variables) to SRAM
- type: PERSISTENT
attributes: { memory: SRAM }
# Specific large weights to MRAM
- id: ["fc_5_weight", "fc_5_bias"]
attributes: { memory: MRAM }
```
2) Push all scratch to PSRAM, except two hot buffers
```yaml
tensors:
- type: SCRATCH
attributes: { memory: PSRAM }
- type: SCRATCH
id: ["hot_buf_0", "hot_buf_1"]
attributes: { memory: DTCM }
```
3) Only move constants to MRAM, leave others on defaults
```yaml
tensors:
- type: CONSTANT
attributes: { memory: MRAM }
```
How attributes are applied
-
During codegen, we compute the effective attributes per tensor by:
- collecting all matching rules,
- sorting by specificity and config order,
- merging in order (later wins at the same specificity).
-
The resulting memory selection informs linker placement / section mapping.
Finding tensor IDs
- IDs generally match the original model (stringified). Use Netron or our Model Explorer to view them.
- The generated offline documentation includes tables of tensors and a model diagram.
Best practices
- Start with a global default (e.g., scratch → DTCM, constants → MRAM), then override problematic tensors.
- Use PSRAM (if available) for large temporaries that don’t fit on-chip; measure the impact.
- Keep hot activations in DTCM if they’re re-used within a layer block.
- Prefer MRAM for immutable weights to minimize SRAM/TCM pressure.
Troubleshooting
- "My rule didn’t apply." → Check the tensor’s kind and ID spelled exactly as in the model; remember precedence rules.
- "Build succeeded but memory usage looks unchanged." → Confirm your platform maps the selected MemoryType to a real section, and that the tensor is large enough to notice the shift.
- "Runtime stalls after moving tensors." → Ensure SCRATCH and PERSISTENT tensors are not placed in read-only memory like MRAM.
Tensor backing model
All tensors — scratch, persistent (resource-variable), and
constant (weights) — are backed by per-memory arenas in the
generated C module. Each tensor descriptor carries a
(region, offset, size) triple resolved at runtime against
ctx->arena_buffers[region]. There are no per-tensor static
symbols.
| Role | Arena symbol | Initialization |
|---|---|---|
scratch |
<prefix>_arena_<mem> |
Caller-supplied or auto-allocated; transient. |
persistent |
<prefix>_arena_persistent_<mem> |
Codegen memset(slot, 0, size) per tensor in context_init. |
constant (cold) |
<prefix>_arena_const_<mem>__blob (a typed array initialized with the weight bytes; const-qualified for cold-storage memories, non-const when the placement section targets loadable RAM such as DTCM/SRAM to avoid linker section-flag conflicts) |
Read in place; no hydration. |
constant (staged) |
<prefix>_arena_const_<mem> arena (writable runtime buffer) populated from <prefix>_arena_const_<mem>__source (cold blob) |
<prefix>_hydrate_constants(&ctx) (weak helper) memcpys source → runtime per arena. |
Cold vs staged constants
A constant is cold when its source memory equals its runtime
memory (kernels read in place from cold storage; byte-identical to
historical XIP). It is staged when those memories differ — a
single contiguous _source blob lives in the cold memory and the
caller hydrates the writable runtime arena from it before
model_run.
Routing is per-tensor via the constant_destination_memory:
attribute:
tensors:
- type: CONSTANT
attributes:
memory: MRAM # source (cold) memory
constant_destination_memory: DTCM # runtime arena memory (staged)
memory:selects the source memory the constant bytes are read from. Under cold residency it is also the runtime read location.constant_destination_memory:selects the destination (writable runtime arena) the kernels read from during inference. When unset (or equal tomemory:) the constant is cold and read in place.- All constants sharing a destination memory share one arena; the planner enforces a single source memory per destination arena so a single bulk transfer suffices to hydrate it.
Hydration contract
When at least one staged constant exists, the generator emits a weak helper:
The default body performs one memcpy per staged constant arena.
<prefix>_model_init invokes <prefix>_hydrate_constants(&ctx)
between <prefix>_context_init and the operator init loop, so
staged arenas are populated before any kernel _init hook (or the
first <prefix>_model_run) can observe them. Callers may:
- rely on the in-
model_initcall (default behavior — works for every config), - pre-hydrate by calling
<prefix>_hydrate_constants(&ctx)after<prefix>_context_initbut before<prefix>_model_init; the default body is idempotent somodel_init's call is then a no-op, - override the weak symbol with a strong replacement (e.g. DMA,
pre-staged HBLRAM, decompression, model-swap) —
model_initcalls the override at the same fixed point. The override must call<prefix>_mark_hydrated()to satisfy the latch, or - skip the helper entirely when the destination arena already
contains the right bytes (e.g. preloaded by the application);
call
<prefix>_mark_hydrated()before<prefix>_model_runso the latch is satisfied.
<prefix>_model_run returns 200 (hydration required) only as
a defense-in-depth check — it normally fires when a caller has
explicitly invoked <prefix>_clear_hydrated() after a successful
model_init without re-running it. The latch is reset on every
<prefix>_context_init.
Legacy auto_hydrate_constants flag
memory.auto_hydrate_constants is retained for backwards
compatibility but no longer changes runtime behavior:
model_init always hydrates staged-constant arenas before
op-init. Override the weak <prefix>_hydrate_constants symbol
when you need a custom hydration mechanism (DMA, decompression,
pre-staged HBLRAM, etc.).
Per-arena source contiguity
The planner packs all constants destined for the same arena into a single contiguous source blob in destination-offset order. A single bulk transfer suffices regardless of how many constants the arena contains, and DMA-based hydration is straightforward.
Persistent zero-init contract
READ_VARIABLE returns zeros on the first invocation when no prior
ASSIGN_VARIABLE has executed. The generator enforces this with an
explicit memset() per persistent in context_init because
caller-supplied arena bytes are not BSS-zeroed.
Application wiring with caller-supplied arenas
When the application supplies its own arena buffers
(--no-memory.allocate-arenas), the generated header exposes a
size define and a region enum entry per arena. Each region must be
bound via <prefix>_bind_arena() (or <prefix>_bind_arenas())
before calling <prefix>_model_init(); context_init reads
the module-global bind table directly and does not inspect any
caller-populated ctx->arena_buffers. Two models sharing a
scratch buffer while keeping independent persistent state look
like this:
#include "model_a.h"
#include "model_b.h"
// Scratch can be aliased — both models reuse the same buffer.
static uint8_t shared_scratch[
MAX(model_a_arena_dtcm_size, model_b_arena_dtcm_size)
] __attribute__((aligned(model_a_arena_dtcm_alignment), section(".dtcm")));
// Persistent buffers are owned per-model.
static uint8_t persist_a[model_a_arena_persistent_sram_size]
__attribute__((aligned(model_a_arena_persistent_sram_alignment), section(".sram")));
static uint8_t persist_b[model_b_arena_persistent_sram_size]
__attribute__((aligned(model_b_arena_persistent_sram_alignment), section(".sram")));
// Bind every region; bind_arena rejects undersized (3) or
// misaligned (4) buffers and unknown region IDs (1) / NULL (2).
model_a_bind_arena(model_a_arena_dtcm, shared_scratch, sizeof(shared_scratch));
model_a_bind_arena(model_a_arena_persistent_sram, persist_a, sizeof(persist_a));
model_b_bind_arena(model_b_arena_dtcm, shared_scratch, sizeof(shared_scratch));
model_b_bind_arena(model_b_arena_persistent_sram, persist_b, sizeof(persist_b));
model_a_model_init(&ctx_a);
model_b_model_init(&ctx_b);
Calling model_run on either context only touches that context's
scratch and its own persistent buffer. The two models never share
mutable state.
When to choose staged constants
- Weights live in slow / non-XIP-friendly storage (e.g. external flash without an XIP window) and must be staged into TCM/SRAM before inference.
- Multi-model swap with a shared constant arena.
- DMA-driven weight load that overlaps with init or runs in parallel with another inference.
- Cold-storage bandwidth needs to be reused between inferences (e.g. shared with logging, sensor capture).
Staged trades RAM (a writable copy of the weights) and a one-time hydration latency for control over weight movement. When RAM is the binding constraint and cold storage is XIP-capable, prefer cold.