
HELIA platform · Ahead-of-time compiler
LiteRT models, compiled to deterministic C for Ambiq silicon.
heliaAOT lowers .tflite flatbuffers into static C modules tuned for the Cortex-M, DSP, and NPU resources on each Ambiq SoC. No on-device interpreter, no runtime arena guessing — every byte that ships in your firmware is reviewable source you generated at build time.
What it is
A static C compiler for the LiteRT models you already deploy.
Feed any .tflite flatbuffer — from TensorFlow, JAX, PyTorch via ai-edge-torch, or your own export pipeline. heliaAOT lowers the graph through an internal IR, resolves kernels and memory layout against your target Ambiq SoC, and emits a focused set of C sources, headers, and build files. The model stops being a runtime payload and becomes reviewable generated code.
Why go ahead-of-time
The headline numbers.
Get started
Generate your first module.
Install the CLI, point it at a .tflite model, and pick an Ambiq target. The output is a ready-to-build C module — drop it into CMake, Zephyr, neuralSPOT, or a CMSIS-Pack.
Run the CLI without installing — uvx fetches heliaAOT into an ephemeral env and executes it. Best for trial conversions and CI smoke checks.
Install as a persistent CLI on your PATH with uv tool or pipx. Updates atomically, isolated from your project envs.
.tflite — no retraining, no API changes required.
Learn more →
Pick your target
Apollo510 EVB by default. Swap targets instantly with --platform.name.
See all platforms →
Choose packaging
CMake, Zephyr, neuralSPOT, or CMSIS-Pack — one flag selects the output format.
Build-system options →
Hardware-aware compilation
Made for Ambiq silicon.
heliaAOT understands the Cortex-M variant, vector ISA, DSP extensions, and memory hierarchy of every Ambiq SoC it targets. That knowledge is compiled in — long before your firmware build kicks off.
On Apollo510, heliaAOT auto-selects MVE-tuned int8 and int16 kernels for conv, matmul, and elementwise paths. Vectorized throughput, zero intrinsics to hand-write.
On Apollo3 and Apollo4, heliaAOT routes through Arm DSP and CMSIS-NN paths matched to the core's SIMD extensions — not scalar reference code.
ITCM, DTCM, SRAM, MRAM, PSRAM. The planner knows each tier's size and priority — and where your activations, constants, and persistent state actually belong.
Multi-arena layout out of the box. Override placement per-tensor or per-role, stage weights between memories, or hand arena ownership to the application.
Progressive control
Maximum efficiency.
Maximum flexibility.
Most projects ship by pointing the CLI at a model and a platform — sensible defaults handle the rest. When you need more, the same compiler exposes per-tensor placement, per-arena memory routing, and per-operator overrides. No plugin layer, no second tool.
Sensible defaults: planner picks arenas, kernels match the SoC, weights stay in MRAM.
memory:
planner: greedy
tensors:
- type: scratch
attributes:
memory: DTCM
- type: constant
attributes:
memory: MRAM
constant_destination_memory: DTCM
Push the hot scratch arena into DTCM; stage constants from MRAM into DTCM at boot.
operators:
- type: "*"
attributes:
code_placement: MRAM
scratch_placement: SRAM
- type: CONV_2D
attributes:
code_placement: ITCM
scratch_placement: DTCM
- type: CONV_2D
id: "9"
attributes: { code_placement: MRAM }
Wildcard defaults, type-level overrides, and per-instance exceptions — resolved by specificity.
Programmatic access
A compiler your tooling can drive.
heliaAOT is a Python library first, CLI second. Import AotConverter in a notebook, sweep configs in CI, register custom ops and kernels — everything the command line does is one function call away.
Register LiteRT parsers and AOT operator classes through a RegistryContext customizer — one canonical key wires the parser, the AIR op_type, and the codegen handler.
Override built-in handlers with allow_override=True to ship hand-tuned kernels — vendor intrinsics, accelerator dispatch, or a profiled fast-path for one hot op.
Sweep platforms, planners, and quantizations from a single Python harness. Diff generated C between runs, gate firmware on size/latency budgets, regenerate on every model checkpoint.
from helia_aot.converter import AotConverter
from helia_aot.registry.context import RegistryContext, build_default_registry_context
from my_plugin.ops import parse_custom_fft, CustomFftOperator
def customize_registry(context: RegistryContext) -> None:
context.litert_parsers.register("CUSTOM_FFT", parse_custom_fft, overwrite=True)
context.aot_operator_classes.register("CUSTOM_FFT", CustomFftOperator, overwrite=True)
registry_context = build_default_registry_context(
customizers=[customize_registry],
allow_override=True,
)
AotConverter(config, registry_context=registry_context).convert()
What you get
Built for real firmware, not just model export.
Every operator, tensor, and arena lives in generated source you can read and review before the build runs.
No interpreter, no resolver, no unused kernels — only the code your specific graph actually needs.
Share constant arenas or take external ownership of buffers for multi-model firmware patterns.
Vela-compiled subgraphs collapse into a single Ethos-U operator with command-stream wiring done for you.
Register custom LiteRT op parsers and AOT handlers for product-specific operators or kernels.
Deterministic output means you can diff generated C between commits, gate on size budgets, and regenerate on every model checkpoint.
Packaging
Ship into the build system you already use.
3 Toolchains
HELIA AI platform
Two paths onto Ambiq silicon. Pick the one you're ready for.
heliaRT gives you a tuned LiteRT runtime — keep the MicroInterpreter API and swap kernels underneath. heliaAOT removes the interpreter entirely: same .tflite input, but the output is static C the linker can place wherever you want. Both share heliaCORE's optimized kernel library.
RTheliaRTLiteRT runtime · tuned kernels
Keep going
Pick a next step.
Three commands got you a module. From here, dial in the build, tune the kernels, or measure what landed on the device.