Skip to content

Tensor Attributes

This page explains how to steer where tensors live in memory in the generated C inference module. Tensor attributes are provided via CLI directly or using YAML and applied during code generation.

Tensor IDs match the original model (stringified). See "How do I determine entity IDs?" in the docs for tips.


What you can control?

Right now tensor attributes focus on memory placement:

  • memory: Choose a valid target memory for a tensor (e.g., DTCM, SRAM, PSRAM, MRAM).

Typical uses:

  • Pin big constant weights to MRAM (or external flash/NVM) to save on SRAM/TCM.
  • Keep scratch working sets in DTCM for speed.
  • Place persistent model state in SRAM or PSRAM depending on size/speed needs.
  • Some operators (e.g. FULLY_CONNECTED) are highly memory bound and benefit from having tensors in faster memory (SRAM/TCM).

Memory types

MemoryType Typical use Notes
ITCM Instructions (code) Not commonly used for tensor data.
DTCM Hot scratch/activation buffers Fast, tightly-coupled; limited capacity.
SRAM General-purpose data Good default for persistent data if it fits.
PSRAM Large, slower, off-chip working sets Volatile; great for big tensors that don’t fit on-chip.
MRAM Non-volatile constants (e.g., weights) Ideal for immutable tensors as it's read-only.

The backend/platform decides how a MemoryType maps to linker sections and where data actually ends up.


Rule structure & precedence

Tensor attributes are expressed with a generic “attribute ruleset”:

tensors:
    - type: "*"                         # Tensor kind: SCRATCH | PERSISTENT | CONSTANT | "*" for all
    id: null                          # Tensor id or list of ids, or null for all
    attributes:
        memory: DTCM                    # MemoryType

    - type: PERSISTENT
    attributes:
        memory: SRAM

    - id: ["conv2d_3_weight", "conv2d_3_bias"]
    attributes:
        memory: MRAM

    - type: SCRATCH
    id: "large_scratch_tensor"
    attributes:
        memory: PSRAM

Matching & precedence (from lowest to highest):

  1. (type="*", id=None) — catch-all
  2. (type=KIND, id=None)
  3. (type="*", id=ID or [IDs])
  4. (type=KIND, id=ID or [IDs]) — most specific

KIND must match the tensor’s kind in the model (e.g., SCRATCH, PERSISTENT, CONSTANT). Use "*" to match all.


Examples

1) Global defaults, with targeted overrides

```yaml
tensors:
    # Default: everything to DTCM
    - attributes: { memory: DTCM }

    # Persistent tensors (variables) to SRAM
    - type: PERSISTENT
    attributes: { memory: SRAM }

    # Specific large weights to MRAM
    - id: ["fc_5_weight", "fc_5_bias"]
    attributes: { memory: MRAM }
```

2) Push all scratch to PSRAM, except two hot buffers

```yaml
tensors:
   - type: SCRATCH
   attributes: { memory: PSRAM }

   - type: SCRATCH
   id: ["hot_buf_0", "hot_buf_1"]
   attributes: { memory: DTCM }
```

3) Only move constants to MRAM, leave others on defaults

```yaml
tensors:
   - type: CONSTANT
     attributes: { memory: MRAM }
```

How attributes are applied

  • During codegen, we compute the effective attributes per tensor by:

    1. collecting all matching rules,
    2. sorting by specificity and config order,
    3. merging in order (later wins at the same specificity).
  • The resulting memory selection informs linker placement / section mapping.


Finding tensor IDs

  • IDs generally match the original model (stringified). Use Netron or our Model Explorer to view them.
  • The generated offline documentation includes tables of tensors and a model diagram.

Best practices

  • Start with a global default (e.g., scratch → DTCM, constants → MRAM), then override problematic tensors.
  • Use PSRAM (if available) for large temporaries that don’t fit on-chip; measure the impact.
  • Keep hot activations in DTCM if they’re re-used within a layer block.
  • Prefer MRAM for immutable weights to minimize SRAM/TCM pressure.

Troubleshooting

  • "My rule didn’t apply." → Check the tensor’s kind and ID spelled exactly as in the model; remember precedence rules.
  • "Build succeeded but memory usage looks unchanged." → Confirm your platform maps the selected MemoryType to a real section, and that the tensor is large enough to notice the shift.
  • "Runtime stalls after moving tensors." → Ensure SCRATCH and PERSISTENT tensors are not placed in read-only memory like MRAM.

Tensor backing model

All tensors — scratch, persistent (resource-variable), and constant (weights) — are backed by per-memory arenas in the generated C module. Each tensor descriptor carries a (region, offset, size) triple resolved at runtime against ctx->arena_buffers[region]. There are no per-tensor static symbols.

Role Arena symbol Initialization
scratch <prefix>_arena_<mem> Caller-supplied or auto-allocated; transient.
persistent <prefix>_arena_persistent_<mem> Codegen memset(slot, 0, size) per tensor in context_init.
constant (cold) <prefix>_arena_const_<mem>__blob (a typed array initialized with the weight bytes; const-qualified for cold-storage memories, non-const when the placement section targets loadable RAM such as DTCM/SRAM to avoid linker section-flag conflicts) Read in place; no hydration.
constant (staged) <prefix>_arena_const_<mem> arena (writable runtime buffer) populated from <prefix>_arena_const_<mem>__source (cold blob) <prefix>_hydrate_constants(&ctx) (weak helper) memcpys source → runtime per arena.

Cold vs staged constants

A constant is cold when its source memory equals its runtime memory (kernels read in place from cold storage; byte-identical to historical XIP). It is staged when those memories differ — a single contiguous _source blob lives in the cold memory and the caller hydrates the writable runtime arena from it before model_run.

Routing is per-tensor via the constant_destination_memory: attribute:

tensors:
    - type: CONSTANT
      attributes:
        memory: MRAM                       # source (cold) memory
        constant_destination_memory: DTCM  # runtime arena memory (staged)
  • memory: selects the source memory the constant bytes are read from. Under cold residency it is also the runtime read location.
  • constant_destination_memory: selects the destination (writable runtime arena) the kernels read from during inference. When unset (or equal to memory:) the constant is cold and read in place.
  • All constants sharing a destination memory share one arena; the planner enforces a single source memory per destination arena so a single bulk transfer suffices to hydrate it.

Hydration contract

When at least one staged constant exists, the generator emits a weak helper:

__attribute__((weak)) int32_t <prefix>_hydrate_constants(
    <prefix>_model_context_t *ctx);

The default body performs one memcpy per staged constant arena. <prefix>_model_init invokes <prefix>_hydrate_constants(&ctx) between <prefix>_context_init and the operator init loop, so staged arenas are populated before any kernel _init hook (or the first <prefix>_model_run) can observe them. Callers may:

  • rely on the in-model_init call (default behavior — works for every config),
  • pre-hydrate by calling <prefix>_hydrate_constants(&ctx) after <prefix>_context_init but before <prefix>_model_init; the default body is idempotent so model_init's call is then a no-op,
  • override the weak symbol with a strong replacement (e.g. DMA, pre-staged HBLRAM, decompression, model-swap) — model_init calls the override at the same fixed point. The override must call <prefix>_mark_hydrated() to satisfy the latch, or
  • skip the helper entirely when the destination arena already contains the right bytes (e.g. preloaded by the application); call <prefix>_mark_hydrated() before <prefix>_model_run so the latch is satisfied.

<prefix>_model_run returns 200 (hydration required) only as a defense-in-depth check — it normally fires when a caller has explicitly invoked <prefix>_clear_hydrated() after a successful model_init without re-running it. The latch is reset on every <prefix>_context_init.

Legacy auto_hydrate_constants flag

memory.auto_hydrate_constants is retained for backwards compatibility but no longer changes runtime behavior: model_init always hydrates staged-constant arenas before op-init. Override the weak <prefix>_hydrate_constants symbol when you need a custom hydration mechanism (DMA, decompression, pre-staged HBLRAM, etc.).

Per-arena source contiguity

The planner packs all constants destined for the same arena into a single contiguous source blob in destination-offset order. A single bulk transfer suffices regardless of how many constants the arena contains, and DMA-based hydration is straightforward.

Persistent zero-init contract

READ_VARIABLE returns zeros on the first invocation when no prior ASSIGN_VARIABLE has executed. The generator enforces this with an explicit memset() per persistent in context_init because caller-supplied arena bytes are not BSS-zeroed.

Application wiring with caller-supplied arenas

When the application supplies its own arena buffers (--no-memory.allocate-arenas), the generated header exposes a size define and a region enum entry per arena. Each region must be bound via <prefix>_bind_arena() (or <prefix>_bind_arenas()) before calling <prefix>_model_init(); context_init reads the module-global bind table directly and does not inspect any caller-populated ctx->arena_buffers. Two models sharing a scratch buffer while keeping independent persistent state look like this:

#include "model_a.h"
#include "model_b.h"

// Scratch can be aliased — both models reuse the same buffer.
static uint8_t shared_scratch[
    MAX(model_a_arena_dtcm_size, model_b_arena_dtcm_size)
] __attribute__((aligned(model_a_arena_dtcm_alignment), section(".dtcm")));

// Persistent buffers are owned per-model.
static uint8_t persist_a[model_a_arena_persistent_sram_size]
    __attribute__((aligned(model_a_arena_persistent_sram_alignment), section(".sram")));
static uint8_t persist_b[model_b_arena_persistent_sram_size]
    __attribute__((aligned(model_b_arena_persistent_sram_alignment), section(".sram")));

// Bind every region; bind_arena rejects undersized (3) or
// misaligned (4) buffers and unknown region IDs (1) / NULL (2).
model_a_bind_arena(model_a_arena_dtcm,            shared_scratch, sizeof(shared_scratch));
model_a_bind_arena(model_a_arena_persistent_sram, persist_a,      sizeof(persist_a));
model_b_bind_arena(model_b_arena_dtcm,            shared_scratch, sizeof(shared_scratch));
model_b_bind_arena(model_b_arena_persistent_sram, persist_b,      sizeof(persist_b));

model_a_model_init(&ctx_a);
model_b_model_init(&ctx_b);

Calling model_run on either context only touches that context's scratch and its own persistent buffer. The two models never share mutable state.

When to choose staged constants

  • Weights live in slow / non-XIP-friendly storage (e.g. external flash without an XIP window) and must be staged into TCM/SRAM before inference.
  • Multi-model swap with a shared constant arena.
  • DMA-driven weight load that overlaps with init or runs in parallel with another inference.
  • Cold-storage bandwidth needs to be reused between inferences (e.g. shared with logging, sensor capture).

Staged trades RAM (a writable copy of the weights) and a one-time hydration latency for control over weight movement. When RAM is the binding constraint and cold storage is XIP-capable, prefer cold.