Benchmarks

MLPerf Tiny Benchmark

MLPerf Tiny is a community-driven benchmark suite from MLCommons designed to measure the performance of machine learning workloads on highly resource-constrained devices like microcontrollers. It covers five representative models—Anomaly Detection (AD), Keyword Spotting (KWS), Image Classification (IC), Streaming Keyword Spotting (STRM), and Visual Wake Words (VWW)—that together exercise the core operators found in tiny-ML inference (Add, Average Pooling, Convolution, Depthwise Convolution, Fully Connected, Reshape, and Softmax). Each model embodies a different use-case: AD detects deviations in time-series sensor data, KWS and STRM recognize spoken keywords (with STRM adding real-time continuous stream handling), IC classifies small images (e.g., 32×32 pixels), and VWW performs a binary “person present” classification on camera data. By standardizing on these workloads, MLPerf Tiny enables apples-to-apples comparisons across software frameworks, compiler toolchains, and hardware platforms.

When we run these models on the Ambiq Apollo510 EVB using HeliosRT v1.3.0 versus HeliosAOT v0.2.2, the results highlight a classic trade-off between flexibility and efficiency. The interpreter-based HeliosRT build consumes around 500 – 750 KB of combined flash and RAM (depending on the model) and delivers inference latencies from 292 µs (≈ 3.4 k inferences/sec on AD) up to 29.5 ms (≈ 34 inferences/sec on VWW). In contrast, the ahead-of-time compiled HeliosAOT build slashes memory usage by 35 – 66 %—dropping total footprint to 189 – 491 KB—and achieves comparable or slightly higher throughput (for example, AD rises from ≈ 3.4 k → 3.6 k inf/sec and VWW from ≈ 34 → 34.1 inf/sec).

These gains make HeliosAOT particularly attractive for production deployments on battery-powered MCUs where every kilobyte counts, while HeliosRT remains the go-to choice when you need on-device model loading or dynamic graph support without reflashing.

Operators

Add
Average Pooling
Convolution
Depthwise Convolution
Fully Connected
Reshape
Softmax

AI Models

Results

The following table summarizes the memory usage, inference latency, and energy consumption for each model when run with HeliosRT and HeliosAOT builds. The values in parentheses indicate the percentage reduction compared to the HeliosRT build.

MODEL	BUILD	TEXT (KB)	DATA (KB)	BSS (KB)	DEC (KB)	Throughput (µs)
AD	HeliosRT	201	279	117	596	292
AD	HeliosAOT	58 (71%)	272 (3%)	30 (74%)	360 (40%)	275
KWS	HeliosRT	201	61	137	399	8,085
KWS	HeliosAOT	83 (59%)	31 (49%)	59 (57%)	173 (57%)	8,074
IC	HeliosRT	211	105	163	478	20,139
IC	HeliosAOT	78 (63%)	84 (20%)	79 (52%)	241 (50%)	20,041
STRM	HeliosRT	211	82	208	500	1,730
STRM	HeliosAOT	84 (60%)	54 (34%)	52 (75%)	189 (38%)	1,719
VWW	HeliosRT	200	334	217	751	29,480
VWW	HeliosAOT	107 (47%)	222 (34%)	163 (25%)	491 (66%)	29,327

The following chart shows the memory usage for each model and build.