HELIA platform · Ahead-of-time compiler

LiteRT models, compiled to deterministic C for Ambiq silicon.

heliaAOT lowers .tflite flatbuffers into static C modules tuned for the Cortex-M, DSP, and NPU resources on each Ambiq SoC. No on-device interpreter, no runtime arena guessing — every byte that ships in your firmware is reviewable source you generated at build time.

Get started in 60 seconds See what you get

What it is

A static C compiler for the LiteRT models you already deploy.

Feed any .tflite flatbuffer — from TensorFlow, JAX, PyTorch via ai-edge-torch, or your own export pipeline. heliaAOT lowers the graph through an internal IR, resolves kernels and memory layout against your target Ambiq SoC, and emits a focused set of C sources, headers, and build files. The model stops being a runtime payload and becomes reviewable generated code.

Why go ahead-of-time

The headline numbers.

Up to 6× Faster inference vs heliaRT

Up to 60% Smaller flash + RAM footprint

Runtime arena guessing 0% Every byte planned at compile time

Built-in 200+ Kernels across the LiteRT op set

Compile, don't interpret AOT Ahead-of-time C, no interpreter on device

Accelerator path MVE Helium, DSP, and NPU kernels selected automatically

Quantization A8W8 A8W8 and A16W8 quantization supported end-to-end

Unified IR AIR One graph definition for parser, planner, codegen

Get started

Generate your first module.

Install the CLI, point it at a .tflite model, and pick an Ambiq target. The output is a ready-to-build C module — drop it into CMake, Zephyr, neuralSPOT, or a CMSIS-Pack.

Install + convertPyPI

One-shotToolSource

Run the CLI without installing — uvx fetches heliaAOT into an ephemeral env and executes it. Best for trial conversions and CI smoke checks.

uvx --python python3.12 helia-aot convert \
    --model.path ./model.tflite \
    --platform.name apollo510_evb

Install as a persistent CLI on your PATH with uv tool or pipx. Updates atomically, isolated from your project envs.

uv tool install --python python3.12 helia-aot
# or: pipx install --python python3.12 helia-aot

helia-aot convert \
    --model.path ./model.tflite \
    --platform.name apollo510_evb

Add heliaAOT to a Python environment with pip or uv pip — ideal for scripting, notebooks, and pipelines that import helia_aot directly.

python -m pip install helia-aot
# or: uv pip install helia-aot

helia-aot convert \
    --model.path ./model.tflite \
    --platform.name apollo510_evb

Same model, no rewrite Already on heliaRT? Drop in the same .tflite — no retraining, no API changes required. Learn more → Pick your target Apollo510 EVB by default. Swap targets instantly with --platform.name. See all platforms → Choose packaging CMake, Zephyr, neuralSPOT, or CMSIS-Pack — one flag selects the output format. Build-system options →

Hardware-aware compilation

Made for Ambiq silicon.

heliaAOT understands the Cortex-M variant, vector ISA, DSP extensions, and memory hierarchy of every Ambiq SoC it targets. That knowledge is compiled in — long before your firmware build kicks off.

Cortex-M55 Helium, engaged.

On Apollo510, heliaAOT auto-selects MVE-tuned int8 and int16 kernels for conv, matmul, and elementwise paths. Vectorized throughput, zero intrinsics to hand-write.

Cortex-M4F DSP-fluent.

On Apollo3 and Apollo4, heliaAOT routes through Arm DSP and CMSIS-NN paths matched to the core's SIMD extensions — not scalar reference code.

Memory model Memory, all of it.

ITCM, DTCM, SRAM, MRAM, PSRAM. The planner knows each tier's size and priority — and where your activations, constants, and persistent state actually belong.

Per-tensor control Arenas, your way.

Multi-arena layout out of the box. Override placement per-tensor or per-role, stage weights between memories, or hand arena ownership to the application.

AP3 Cortex-M4F · DSP AP4 Cortex-M4F · DSP AP5 Cortex-M55 · Helium/MVE All supported boards →

Progressive control

Maximum efficiency.
Maximum flexibility.

Most projects ship by pointing the CLI at a model and a platform — sensible defaults handle the rest. When you need more, the same compiler exposes per-tensor placement, per-arena memory routing, and per-operator overrides. No plugin layer, no second tool.

Simple One line, one platform.

helia-aot convert \
    --model.path model.tflite \
    --platform.name apollo510_evb

Sensible defaults: planner picks arenas, kernels match the SoC, weights stay in MRAM.

Tuned Pin arenas to TCM.

memory:
  planner: greedy
  tensors:
    - type: scratch
      attributes:
        memory: DTCM
    - type: constant
      attributes:
        memory: MRAM
        constant_destination_memory: DTCM

Push the hot scratch arena into DTCM; stage constants from MRAM into DTCM at boot.

Advanced Per-op kernel placement.

operators:
  - type: "*"
    attributes:
      code_placement: MRAM
      scratch_placement: SRAM

  - type: CONV_2D
    attributes:
      code_placement: ITCM
      scratch_placement: DTCM

  - type: CONV_2D
    id: "9"
    attributes: { code_placement: MRAM }

Wildcard defaults, type-level overrides, and per-instance exceptions — resolved by specificity.

Programmatic access

A compiler your tooling can drive.

heliaAOT is a Python library first, CLI second. Import AotConverter in a notebook, sweep configs in CI, register custom ops and kernels — everything the command line does is one function call away.

Custom ops Bring your own operators.

Register LiteRT parsers and AOT operator classes through a RegistryContext customizer — one canonical key wires the parser, the AIR op_type, and the codegen handler.

Custom kernels Swap in your own implementation.

Override built-in handlers with allow_override=True to ship hand-tuned kernels — vendor intrinsics, accelerator dispatch, or a profiled fast-path for one hot op.

CI / pipelines Automate every build.

Sweep platforms, planners, and quantizations from a single Python harness. Diff generated C between runs, gate firmware on size/latency budgets, regenerate on every model checkpoint.

from helia_aot.converter import AotConverter
from helia_aot.registry.context import RegistryContext, build_default_registry_context

from my_plugin.ops import parse_custom_fft, CustomFftOperator


def customize_registry(context: RegistryContext) -> None:
    context.litert_parsers.register("CUSTOM_FFT", parse_custom_fft, overwrite=True)
    context.aot_operator_classes.register("CUSTOM_FFT", CustomFftOperator, overwrite=True)


registry_context = build_default_registry_context(
    customizers=[customize_registry],
    allow_override=True,
)

AotConverter(config, registry_context=registry_context).convert()

Custom operators guide Adding operators

What you get

Built for real firmware, not just model export.

Deterministic C

Every operator, tensor, and arena lives in generated source you can read and review before the build runs.

Small footprint

No interpreter, no resolver, no unused kernels — only the code your specific graph actually needs.

Multi-model ready

Share constant arenas or take external ownership of buffers for multi-model firmware patterns.

Ethos-U aware

Vela-compiled subgraphs collapse into a single Ethos-U operator with command-stream wiring done for you.

Extensible

Register custom LiteRT op parsers and AOT handlers for product-specific operators or kernels.

CI-native

Deterministic output means you can diff generated C between commits, gate on size budgets, and regenerate on every model checkpoint.

Packaging

Ship into the build system you already use.

3 Toolchains

GGCCarm-none-eabi-gcc on Linux, macOS, Windows.

AarmclangArm Compiler 6 with full Helium codegen.

LATfEArm Toolchain for Embedded (LLVM).

4 Build systems

CMSIS-PackOpen-CMSIS-Pack .pdsc archive ready for cpackget. CMakeStatic library + headers for any Cortex-M firmware build. ZephyrDrop-in west module with Kconfig wiring. NSX / neuralSPOTNative deployment path for Ambiq EVBs.

3 × 4 = 12validated build combinations, tested in CI on every release.

HELIA AI platform

Two paths onto Ambiq silicon. Pick the one you're ready for.

heliaRT gives you a tuned LiteRT runtime — keep the MicroInterpreter API and swap kernels underneath. heliaAOT removes the interpreter entirely: same .tflite input, but the output is static C the linker can place wherever you want. Both share heliaCORE's optimized kernel library.

Compare with heliaRT

AOTheliaAOTAhead-of-time compiler · static C

RTheliaRTLiteRT runtime · tuned kernels

COREheliaCOREKernel layer powering both heliaRT and heliaAOT

Keep going

Pick a next step.

Three commands got you a module. From here, dial in the build, tune the kernels, or measure what landed on the device.

01 · Install Set up the CLI pipx, uv tool, uvx, or pip — pick the workflow that fits your team. Install guide → 02 · How-to Task-focused recipes Package for CMake, Zephyr, neuralSPOT, or CMSIS-Pack and register custom ops. Browse guides → 03 · Reference CLI & attributes Every command, platform target, registry, and artifact, documented and versioned. Open reference → 04 · Benchmarks See it on silicon Footprint and latency on Apollo4 and Apollo510 — heliaAOT vs heliaRT, head-to-head. View benchmarks →