Benchmarks
MLPerf Tiny Benchmark
MLPerf Tiny is a community-driven benchmark suite from MLCommons designed to measure the performance of machine learning workloads on highly resource-constrained devices like microcontrollers. It covers five representative models—Anomaly Detection (AD), Keyword Spotting (KWS), Image Classification (IC), Streaming Keyword Spotting (STRM), and Visual Wake Words (VWW)—that together exercise the core operators found in tiny-ML inference (Add, Average Pooling, Convolution, Depthwise Convolution, Fully Connected, Reshape, and Softmax). Each model embodies a different use-case: AD detects deviations in time-series sensor data, KWS and STRM recognize spoken keywords (with STRM adding real-time continuous stream handling), IC classifies small images (e.g., 32×32 pixels), and VWW performs a binary “person present” classification on camera data. By standardizing on these workloads, MLPerf Tiny enables apples-to-apples comparisons across software frameworks, compiler toolchains, and hardware platforms.
When we run these models on the Ambiq Apollo510 EVB using HeliosRT v1.3.0 versus HeliosAOT v0.2.2, the results highlight a classic trade-off between flexibility and efficiency. The interpreter-based HeliosRT build consumes around 500 – 750 KB of combined flash and RAM (depending on the model) and delivers inference latencies from 292 µs (≈ 3.4 k inferences/sec on AD) up to 29.5 ms (≈ 34 inferences/sec on VWW). In contrast, the ahead-of-time compiled HeliosAOT build slashes memory usage by 35 – 66 %—dropping total footprint to 189 – 491 KB—and achieves comparable or slightly higher throughput (for example, AD rises from ≈ 3.4 k → 3.6 k inf/sec and VWW from ≈ 34 → 34.1 inf/sec).
These gains make HeliosAOT particularly attractive for production deployments on battery-powered MCUs where every kilobyte counts, while HeliosRT remains the go-to choice when you need on-device model loading or dynamic graph support without reflashing.
Operators
- Add
- Average Pooling
- Convolution
- Depthwise Convolution
- Fully Connected
- Reshape
- Softmax
AI Models
Results
The following table summarizes the memory usage, inference latency, and energy consumption for each model when run with HeliosRT and HeliosAOT builds. The values in parentheses indicate the percentage reduction compared to the HeliosRT build.
MODEL | BUILD | TEXT (KB) | DATA (KB) | BSS (KB) | DEC (KB) | Throughput (µs) |
---|---|---|---|---|---|---|
AD | HeliosRT | 201 | 279 | 117 | 596 | 292 |
AD | HeliosAOT | 58 (71%) | 272 (3%) | 30 (74%) | 360 (40%) | 275 |
KWS | HeliosRT | 201 | 61 | 137 | 399 | 8,085 |
KWS | HeliosAOT | 83 (59%) | 31 (49%) | 59 (57%) | 173 (57%) | 8,074 |
IC | HeliosRT | 211 | 105 | 163 | 478 | 20,139 |
IC | HeliosAOT | 78 (63%) | 84 (20%) | 79 (52%) | 241 (50%) | 20,041 |
STRM | HeliosRT | 211 | 82 | 208 | 500 | 1,730 |
STRM | HeliosAOT | 84 (60%) | 54 (34%) | 52 (75%) | 189 (38%) | 1,719 |
VWW | HeliosRT | 200 | 334 | 217 | 751 | 29,480 |
VWW | HeliosAOT | 107 (47%) | 222 (34%) | 163 (25%) | 491 (66%) | 29,327 |
The following chart shows the memory usage for each model and build.