Operator & Kernel Coverage

heliaCORE is organized around operator families and kernel variants. Operators are the public neural-network operations users care about (Conv2D, Pad, Softmax, LeakyReLU, and so on). Kernel variants are the implementations selected for a target CPU, data type, and acceleration path.

The goal is not to document every function on this page. For exact prototypes, parameters, scratch-buffer queries, and per-function behavior, use the generated API reference described in API.

Acceleration paths

Kernel selection is compile-time and follows the feature flags produced by the compiler and build system for Ambiq targets.

Path

Typical Ambiq use

Notes

Pure C

Baseline path for smallest Cortex-M targets.

Always available; keeps functionality portable inside Ambiq builds.

DSP

Apollo-class Cortex-M targets with DSP extension.

Uses SIMD-style DSP intrinsics where they help latency and code size.

MVE / Helium

Cortex-M55-class Apollo targets.

Primary acceleration focus where MVE is present; vectorized kernels target real HELIA workloads.

Operator families

The table below is intentionally grouped. It is meant to help users understand coverage shape without duplicating the full API reference.

Family

Examples

Coverage notes

Convolution

Conv2D, DepthwiseConv2D, TransposeConv2D

int8 is the common path; selected operators also include int16 and int4-weight variants.

Dense / matrix

Fully Connected, Batch MatMul, SVDF

Includes helper kernels under NNSupportFunctions/ for matmul and requantization paths.

Recurrent

LSTM, SVDF

Used by speech and sequence workloads in Ambiq model flows.

Activation

ReLU, ReLU6, LeakyReLU, PReLU, Hard-Swish, Logistic, Tanh, Clamp

Includes Ambiq coverage for glue operators that appear frequently in field-like models.

Elementwise / math

Add, Sub, Mul, Min/Max, Abs, Squared Difference, Sqrt/Rsqrt, Mean

Often small individually, but can matter for end-to-end latency.

Comparison / reduction

Equal/NotEqual/Less/Greater, ArgMin/ArgMax, Reduce Min/Max

Maintains quantized operator coverage used by embedded inference graphs.

Pooling / softmax

MaxPool, AvgPool, Softmax

Common post-convolution and classifier tail operations.

Data movement

Pad, Transpose, Reshape, Concatenation, Split, StridedSlice, Gather/GatherND

Important in real graphs; these operators can dominate if left unoptimized.

Quantization

Quantize, Dequantize, requantization helpers

Supports TFLM-style affine quantization and HELIA deployment paths.

Layout transforms

Resize Nearest Neighbor, Space-to-Batch, Batch-to-Space, Space-to-Depth, Depth-to-Space

Provides coverage for model-conversion and graph-shaping patterns.

Data types

heliaCORE focuses on quantized inference for Ambiq devices:

  • int8 activations / weights for the broadest operator surface.

  • int16 activations where models need wider activation precision.

  • int4 weights with int8 activations for selected high-value kernels such as Conv2D, DepthwiseConv2D, and Fully Connected.

Exact dtype support is per function and per variant. Check the generated API reference before depending on a specific dtype/backend combination.

Reading the source tree

The source layout mirrors operator families:

Directory

Purpose

Source/ConvolutionFunctions/

Conv-family kernels and variants.

Source/FullyConnectedFunctions/

Dense layer kernels.

Source/ActivationFunctions/

Activation and clamp-style operators.

Source/NNSupportFunctions/

Shared matmul, requantization, and helper kernels.

Source/PadFunctions/, Source/TransposeFunctions/, Source/ReshapeFunctions/

Data-movement and graph-shaping operators.

The public C API is declared primarily in Include/arm_nnfunctions.h and helper APIs in Include/arm_nnsupportfunctions.h.

Scope

heliaCORE is intended for HELIA AI workflows on Ambiq silicon. Arm CMSIS-NN remains the vendor-neutral upstream ecosystem reference for general Cortex-M kernel development.