Memory Placement¶
NSX provides portable memory-placement macros in nsx_mem.h that let you
control where variables and code are physically located — without writing
SoC-specific linker attributes.
Why It Matters¶
Ambiq SoCs have multiple memory regions with very different performance and capacity characteristics:
| Region | Apollo510 | Access speed | Typical use |
|---|---|---|---|
| MRAM (flash) | 4 MB | Slow (wait states) | Code, const data |
| TCM | 496 KB | Fast (0 wait) | Stack, heap, .data, .bss |
| ITCM | 256 KB | Fast (0 wait) | Hot code paths |
| Shared SRAM | 3 MB | Medium (1+ wait) | Large buffers, model weights |
Placing data in the wrong region can mean the difference between 21 ms and 74 ms inference time — a 3.5x difference — with no code changes.
Macros¶
| Macro | Where it goes | Initialized | Use case |
|---|---|---|---|
NSX_MEM_NVM |
MRAM (flash) | In-place | Large const data (LUTs, tables) |
NSX_MEM_FAST |
TCM | Copied from NVM | Fast initialized data |
NSX_MEM_FAST_BSS |
TCM | Zeroed | Fast uninitialized data |
NSX_MEM_SRAM |
Shared SRAM | Copied from NVM | Large initialized buffers, model weights |
NSX_MEM_SRAM_BSS |
Shared SRAM | Zeroed | Tensor arenas, DMA buffers, scratch space |
NSX_MEM_FAST_CODE |
ITCM/DTCM/TCM | Copied from NVM | Hot inner loops, ISRs |
Usage¶
Place the macro before the type, after any storage class:
#include "nsx_mem.h"
// 64 KB tensor arena in shared SRAM — zeroed at boot, no copy cost
NSX_MEM_SRAM_BSS alignas(16) static uint8_t g_arena[65536];
// Model weights in shared SRAM — copied from flash at boot
NSX_MEM_SRAM alignas(16) static const uint8_t g_model[] = { 0x1c, 0x00, ... };
// Hot inference kernel in ITCM — 0-wait-state code execution
NSX_MEM_FAST_CODE void my_fast_kernel(void) { ... }
// Large LUT kept in flash — don't waste RAM
NSX_MEM_NVM const int16_t g_lut[8192] = { ... };
SoC Support Matrix¶
Macros degrade gracefully — on SoCs without a particular memory region, the macro expands to nothing and the variable goes to the default section.
| Macro | Apollo3 | Apollo3P | Apollo4P | Apollo510 | Apollo330P |
|---|---|---|---|---|---|
NSX_MEM_SRAM |
default | default | .shared |
.shared |
.shared |
NSX_MEM_SRAM_BSS |
default | default | .sram_bss |
.sram_bss |
.sram_bss |
NSX_MEM_FAST_CODE |
default | .tcm |
default | .itcm_text |
.dtcm_text |
NSX_MEM_FAST |
default | default | default | default | default |
NSX_MEM_FAST_BSS |
default | default | default | default | default |
"default" means the compiler's normal .data or .bss section (typically in TCM).
Linker Section Mapping¶
Each macro targets a specific linker output section. These must exist in your linker script:
Macro → Section → Memory Region → Boot action
─────────────────────────────────────────────────────────────────────
NSX_MEM_SRAM → .shared → SHARED_SRAM → Copy from MRAM
NSX_MEM_SRAM_BSS → .sram_bss → SHARED_SRAM → Zeroed
NSX_MEM_FAST_CODE → .itcm_text → MCU_ITCM → Copy from MRAM
NSX_MEM_FAST → .data → MCU_TCM → Copy from MRAM
NSX_MEM_FAST_BSS → .bss → MCU_TCM → Zeroed
NSX_MEM_NVM → .rodata → MCU_MRAM → In-place
Cache Helpers¶
nsx_mem.h also provides lightweight cache control:
uint32_t nsx_cache_enable(void); // Enable I/D cache (or unified cache on AP3/AP4)
void nsx_cache_disable(void);
Returns 0 on success. On Apollo5, returns non-zero if the cache power domain
(CPDLP) is not active — call nsx_hw_init() or nsx_minimal_hw_init()
first.
On Apollo3, these are no-ops. On Apollo4, they configure the unified cache.
Practical Guidance¶
Model weights: TCM vs SRAM¶
For small models (under ~200 KB), place weights in TCM for 0-wait-state access. This is the default — just declare the array normally.
For large models, use NSX_MEM_SRAM to avoid overflowing TCM. The tradeoff
is slightly higher access latency (partially mitigated by D-cache).
// Small model — stays in TCM (fast, limited capacity)
static const uint8_t small_model[] = { ... };
// Large model — goes to shared SRAM (more room, slightly slower)
NSX_MEM_SRAM static const uint8_t large_model[] = { ... };
Tensor arenas¶
Tensor arenas don't need initialization — use NSX_MEM_SRAM_BSS or
NSX_MEM_FAST_BSS to avoid boot-time copy cost:
// In SRAM if it's too big for TCM
NSX_MEM_SRAM_BSS alignas(16) static uint8_t arena[256 * 1024];
// In TCM if it fits (faster, recommended for small arenas)
NSX_MEM_FAST_BSS alignas(16) static uint8_t arena[64 * 1024];
Hot code in ITCM¶
On Apollo510, TFLM kernel code can be placed in ITCM for faster execution.
The linker_script_itcm_sbl.ld variant uses KEEP directives to pull
specific object files into ITCM:
For your own hot functions, use NSX_MEM_FAST_CODE:
NSX_MEM_FAST_CODE void my_dsp_kernel(int16_t *buf, size_t len) {
// Runs from ITCM — 0-wait-state fetch
}
ITCM is limited
Apollo510 ITCM is 256 KB. Placing too much code there will cause a linker overflow error. Profile first, then move only the hottest paths.
Backward Compatibility¶
Legacy macros are mapped to the new system:
| Legacy | Maps to |
|---|---|
AM_SHARED_RW |
NSX_MEM_SRAM |
NS_SRAM_BSS |
NSX_MEM_SRAM_BSS |
NS_PUT_IN_TCM |
NSX_MEM_FAST |
Prefer NSX_MEM_* in new code.