Zephyr + heliaRT

Prerequisite

This guide assumes you already have a working Zephyr development environment (Zephyr repo, west, SDK). If not, follow the Zephyr Getting Started Guide first.

This example shows the minimum process to create, build, and run a Zephyr application that uses heliaRT.

Use one integration path or the other:

source modules: helia-rt + open cmsis-nn
source modules: helia-rt + ns-cmsis-nn
prebuilt release module: a single helia-rt archive with heliaRT + ns-cmsis-nn already linked in

This guide assumes a workspace like:

<ws>/
├── zephyr/
├── modules/
└── app/
    └── helia_rt_app/
        ├── CMakeLists.txt
        ├── prj.conf
        └── src/
            └── main.cpp

Known-good versions:

Zephyr: 4.3
Zephyr SDK: zephyr-sdk-1.0.1

Note

Do not add both the source-module and prebuilt-release variants to the same app.

Source Modules + CMSIS-NNSource Modules + HELIAPrebuilt Release Module

1. Fetch the modules

Open cmsis-nn is already provided by the standard Zephyr west manifest at modules/lib/cmsis-nn, so you only need to add helia-rt to your workspace west.yml projects list. (If your workspace does not inherit the standard manifest, add cmsis-nn from github.com/zephyrproject-rtos/cmsis-nn as a second project with path: modules/lib/cmsis-nn.)

Add helia-rt:

- name: helia-rt
  url: https://github.com/AmbiqAI/helia-rt
  revision: <helia-rt-version>   # e.g. helia-rt-v1.16.0
  path: modules/helia-rt

Then fetch modules:

Note

If this is your first time setting up the workspace, run west update to fetch all modules. If you already have a workspace and are just adding these entries, west update helia-rt ns-cmsis-nn fetches only the new modules without re-downloading everything else.

west update helia-rt cmsis-nn

Result:

<ws>/
├── zephyr/
├── modules/
│   ├── helia-rt/
│   └── lib/
│       └── cmsis-nn/
└── app/
    └── helia_rt_app/

2. Create the application

CMakeLists.txt

cmake_minimum_required(VERSION 3.20.0)

find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(helia_rt_app)

set(NO_THREADSAFE_STATICS $<TARGET_PROPERTY:compiler-cpp,no_threadsafe_statics>)
zephyr_compile_options($<$<COMPILE_LANGUAGE:CXX>:${NO_THREADSAFE_STATICS}>)

target_sources(app PRIVATE src/main.cpp)

prj.conf

CONFIG_STD_CPP17=y

CONFIG_PRINTK=y
CONFIG_CONSOLE=y
CONFIG_UART_CONSOLE=y

CONFIG_HELIA_RT=y
CONFIG_NS_CMSIS_NN=n
CONFIG_CMSIS_NN=y
CONFIG_CMSIS_NN_CONVOLUTION=y
CONFIG_CMSIS_NN_FULLYCONNECTED=y

Required heliaRT-specific settings:

CONFIG_HELIA_RT=y
CONFIG_NS_CMSIS_NN=n (disable the HELIA backend that heliaRT enables by default)
CONFIG_CMSIS_NN=y and per-op kernel configs for your model

Notes:

With west-managed modules, no ZEPHYR_EXTRA_MODULES is needed.
Do not add open cmsis-nn to ZEPHYR_EXTRA_MODULES when it already comes from the standard west workspace.
In the default Zephyr workspace layout, open cmsis-nn is discovered at modules/lib/cmsis-nn.
Open cmsis-nn does not provide an ALL Kconfig switch in Zephyr. You must enable the CMSIS-NN kernel groups your model needs.

Do not enable HELIA-only settings on this path:

CONFIG_NS_CMSIS_NN

1. Fetch the modules

Add both helia-rt and ns-cmsis-nn as west projects in your workspace's west.yml:

- name: helia-rt
  url: https://github.com/AmbiqAI/helia-rt
  revision: <helia-rt-version>     # e.g. helia-rt-v1.16.0
  path: modules/helia-rt

- name: ns-cmsis-nn
  url: https://github.com/AmbiqAI/ns-cmsis-nn
  revision: <ns-cmsis-nn-version>  # e.g. v7.25.0
  path: modules/ns-cmsis-nn

Then fetch both modules:

west update helia-rt ns-cmsis-nn

Result:

<ws>/
├── zephyr/
├── modules/
│   ├── helia-rt/
│   └── ns-cmsis-nn/
└── app/
    └── helia_rt_app/

2. Create the application

CMakeLists.txt

cmake_minimum_required(VERSION 3.20.0)

find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(helia_rt_app)

set(NO_THREADSAFE_STATICS $<TARGET_PROPERTY:compiler-cpp,no_threadsafe_statics>)
zephyr_compile_options($<$<COMPILE_LANGUAGE:CXX>:${NO_THREADSAFE_STATICS}>)

target_sources(app PRIVATE src/main.cpp)

prj.conf

CONFIG_STD_CPP17=y

CONFIG_PRINTK=y
CONFIG_CONSOLE=y
CONFIG_UART_CONSOLE=y

CONFIG_HELIA_RT=y

Required heliaRT-specific settings:

CONFIG_HELIA_RT=y

Optional HELIA kernel profile:

CONFIG_HELIA_RT_KERNEL_OPTIMIZE_SPEED=y This is the default.
CONFIG_HELIA_RT_KERNEL_OPTIMIZE_SIZE=y

1. Download the prebuilt release

Download the prebuilt heliaRT release archive from the Ambiq content portal.

This bundle already contains the HELIA runtime and the ns-cmsis-nn kernel implementation inside the static archive.

After extracting it, copy the bundle into modules/:

cp -r <download-dir>/helia-rt-m55-release <ws>/modules/

Result:

<ws>/
├── zephyr/
├── modules/
│   └── helia-rt-m55-release/
└── app/
    └── helia_rt_app/

2. Create the application

CMakeLists.txt

Only add the prebuilt bundle as a Zephyr module:

cmake_minimum_required(VERSION 3.20.0)

list(APPEND ZEPHYR_EXTRA_MODULES
  ${CMAKE_CURRENT_SOURCE_DIR}/../../modules/helia-rt-m55-release
)

find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(helia_rt_app)

set(NO_THREADSAFE_STATICS $<TARGET_PROPERTY:compiler-cpp,no_threadsafe_statics>)
zephyr_compile_options($<$<COMPILE_LANGUAGE:CXX>:${NO_THREADSAFE_STATICS}>)

target_sources(app PRIVATE src/main.cpp)

prj.conf

CONFIG_STD_CPP17=y

CONFIG_PRINTK=y
CONFIG_CONSOLE=y
CONFIG_UART_CONSOLE=y

CONFIG_HELIA_RT=y

Optional prebuilt flavor selection:

CONFIG_HELIA_RT_PREBUILT_BUILD_RELEASE=y This is the default.
CONFIG_HELIA_RT_PREBUILT_BUILD_DEBUG=y
CONFIG_HELIA_RT_PREBUILT_BUILD_RELEASE_WITH_LOGS=y

These options select the prebuilt archive's build flavor, not the HELIA kernel SPEED/SIZE profile. Current prebuilt release bundles do not publish separate SPEED/SIZE kernel-profile archives; use the source-module path if you need to choose CONFIG_HELIA_RT_KERNEL_OPTIMIZE_SPEED or CONFIG_HELIA_RT_KERNEL_OPTIMIZE_SIZE.

Do not enable these source-module-only settings with the prebuilt bundle:

CONFIG_NS_CMSIS_NN

Notes:

Do not add modules/ns-cmsis-nn to ZEPHYR_EXTRA_MODULES for the prebuilt bundle.
Enable CONFIG_FPU=y when using the prebuilt cm55 archive. The published Cortex-M55 prebuilts use hard-float calling conventions.
The prebuilt Zephyr module also forces TF_LITE_STATIC_MEMORY for the application build. That is required so your app sees the same TfLiteTensor layout as the prebuilt archive.
The prebuilt Zephyr module supports Cortex-M55, and Cortex-M4 with FPU.
The prebuilt archive selection is automatic from board CPU, toolchain, and selected flavor.

Using Reference kernels

To use generic Reference kernels instead of an accelerated backend, suppress the ns-cmsis-nn auto-imply in your prj.conf:

CONFIG_STD_CPP17=y
CONFIG_HELIA_RT=y
CONFIG_NS_CMSIS_NN=n

No ns-cmsis-nn or cmsis-nn module is needed. The Reference backend is selected automatically when neither is active.

3. Minimal bring-up

After wiring the module and prj.conf, the smallest useful app flow is:

embed a .tflite flatbuffer as a C array
map it with tflite::GetModel()
register the ops your model uses
allocate tensors
write input data
call Invoke()
read the output tensor

The example below is intentionally small, but it is a real inference flow. It assumes:

one embedded model in g_model
one int8 input tensor
one int8 output tensor
a model that uses FULLY_CONNECTED

If your model uses different operators or tensor types, change the resolver and tensor access accordingly.

src/model_data.h

extern const unsigned char g_model[];
extern const int g_model_len;

src/model_data.cpp

// Convert your .tflite file into a C array, for example:
// xxd -i model.tflite > model_data.cpp

Then add src/model_data.cpp to CMakeLists.txt:

target_sources(app PRIVATE src/main.cpp src/model_data.cpp)

src/main.cpp

#include <cstdint>

#include <zephyr/sys/printk.h>

#include <tensorflow/lite/micro/micro_interpreter.h>
#include <tensorflow/lite/micro/micro_mutable_op_resolver.h>
#include <tensorflow/lite/schema/schema_generated.h>

#include "model_data.h"

namespace {

constexpr int kTensorArenaSize = 96 * 1024;
alignas(16) uint8_t tensor_arena[kTensorArenaSize];

int TensorElementCount(const TfLiteTensor* tensor) {
  int count = 1;
  for (int i = 0; i < tensor->dims->size; ++i) {
    count *= tensor->dims->data[i];
  }
  return count;
}

}  // namespace

int main() {
  const tflite::Model* model = tflite::GetModel(g_model);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    printk("Model schema mismatch: %d != %d\n",
           model->version(), TFLITE_SCHEMA_VERSION);
    return 1;
  }

  tflite::MicroMutableOpResolver<1> resolver;
  resolver.AddFullyConnected();

  tflite::MicroInterpreter interpreter(
      model, resolver, tensor_arena, kTensorArenaSize);

  if (interpreter.AllocateTensors() != kTfLiteOk) {
    printk("AllocateTensors failed\n");
    return 1;
  }

  TfLiteTensor* input = interpreter.input(0);
  TfLiteTensor* output = interpreter.output(0);
  if (input == nullptr || output == nullptr) {
    printk("Missing input or output tensor\n");
    return 1;
  }

  const int input_count = TensorElementCount(input);
  for (int i = 0; i < input_count; ++i) {
    input->data.int8[i] = static_cast<int8_t>(input->params.zero_point);
  }

  if (interpreter.Invoke() != kTfLiteOk) {
    printk("Invoke failed\n");
    return 1;
  }

  const int output_count = TensorElementCount(output);
  const int preview = output_count < 8 ? output_count : 8;
  for (int i = 0; i < preview; ++i) {
    printk("out[%d] = %d\n", i, output->data.int8[i]);
  }

  while (true) {

  }

  return 0;
}

4. Build

The examples below use Apollo510 EVB; substitute your board and app source path as needed.

GCC (default)ATfE (recommended)

GCC is the Zephyr default. No extra flags are required when ZEPHYR_TOOLCHAIN_VARIANT is unset or set to zephyr.

west build -p always -b apollo510_evb \
  -s app/helia_rt_app -d build/helia_rt_app_gcc

If you installed the Arm GNU Toolchain separately (outside the Zephyr SDK), set the variant explicitly:

west build -p always -b apollo510_evb \
  -s app/helia_rt_app -d build/helia_rt_app_gcc \
  -- -DZEPHYR_TOOLCHAIN_VARIANT=gnuarmemb \
     -DGNUARMEMB_TOOLCHAIN_PATH=/path/to/gcc-arm-none-eabi

ATfE (Arm Toolchain for Embedded) is LLVM-based and open-source. On Cortex-M55 + Helium workloads it produces code that is up to 25 % more efficient¹ than GCC — fewer cycles and more inferences per Joule.

Point LLVM_TOOLCHAIN_PATH at the ATfE install root:

west build -p always -b apollo510_evb \
  -s app/helia_rt_app -d build/helia_rt_app_atfe \
  -- -DZEPHYR_TOOLCHAIN_VARIANT=host \
     -DTOOLCHAIN_VARIANT_COMPILER=llvm \
     -DLLVM_TOOLCHAIN_PATH=/path/to/ATfE-<version> \
     -DCONFIG_LLVM_USE_LLD=y \
     -DCONFIG_COMPILER_RT_RTLIB=y

Flag	Purpose
`-DZEPHYR_TOOLCHAIN_VARIANT=host`	Select the host toolchain variant
`-DTOOLCHAIN_VARIANT_COMPILER=llvm`	Use LLVM/Clang as the compiler within the host variant
`-DLLVM_TOOLCHAIN_PATH=...`	Root of the ATfE installation (contains `bin/`, `lib/`, …)
`-DCONFIG_LLVM_USE_LLD=y`	Use LLD instead of GNU ld
`-DCONFIG_COMPILER_RT_RTLIB=y`	Link compiler-rt instead of libgcc

5. Flash

# Substitute the build directory you used in step 4
# (e.g. build/helia_rt_app_gcc or build/helia_rt_app_atfe)
west flash -d build/helia_rt_app_gcc

6. View logs

If UART console is enabled, open the board serial port at 115200 8N1.

Example:

screen /dev/cu.usbmodemXXXX 115200

7. Checklist

Source modules + CMSIS-NN:

modules/helia-rt exists (via west update helia-rt)
modules/lib/cmsis-nn exists (via west update cmsis-nn)
CONFIG_HELIA_RT=y is enabled
CONFIG_NS_CMSIS_NN=n suppresses the auto-imply
CONFIG_CMSIS_NN=y and per-op kernel configs are enabled
no HELIA-only Kconfig options are enabled

Source modules + HELIA:

modules/helia-rt exists (via west update helia-rt)
modules/ns-cmsis-nn exists (via west update ns-cmsis-nn)
CONFIG_HELIA_RT=y is enabled
backend, CPP, FPU, and ns-cmsis-nn auto-configure

Prebuilt release module:

modules/helia-rt-v1.16.0 exists
only that path is listed in ZEPHYR_EXTRA_MODULES
CONFIG_HELIA_RT=y is enabled
CONFIG_FPU=y is enabled for Cortex-M55 builds
no source-backend Kconfig options are enabled

For the broader setup guide, including HELIA, open CMSIS-NN, and prebuilt flows, see Zephyr setup.

Measured across the MLPerf Tiny v1.1 reference suite on the Apollo510 EVB (Cortex-M55 + Helium @ 192 MHz, 10 iterations) using heliaRT v1.13.1. Latency derived from PMU cycles; energy captured with a Joulescope. Compilers: ATfE 22.1 vs arm-none-eabi-gcc 14.2. Headline "up to 25 %" refers to the inferences-per-Joule improvement on Image Classification (ResNet, +24.4 %, rounded). Every model also ran with lower latency under ATfE (4 %–13 % fewer cycles) and lower energy per inference (6 %–20 %). See Toolchains → Why ATfE for the full per-model table. ↩