soundKIT

An AI Development Kit for real-time audio processing on Ambiq ultra-low power devices

Overview

Introducing soundKIT, an Audio AI Development Kit (ADK) that helps you train, evaluate, and deploy real-time audio models on Ambiq's family of ultra-low power SoCs. It ships with datasets, efficient model architectures, and reference tasks out of the box, plus optimization and deployment routines for edge inference. The kit also includes pre-trained models and task-level demos to help you get started quickly.

At its core, soundKIT is a playground: swap datasets, architectures, tasks, and training recipes via YAML, the CLI, or directly in code.

Whether you're prototyping on a PC or deploying to Ambiq's family of ultra-low power SoCs, soundKIT provides a consistent path from data preparation to evaluation and deployment.

For production-grade deployment, soundKIT utilizes HeliaRT, Ambiq's ultra-efficient edge AI runtime. Optimized specifically for the Apollo family of SoCs, HeliaRT delivers:

Up to 3x faster inference compared to standard LiteRT (formerly TFLM) implementations.
Custom AI kernels that leverage Apollo510’s vector acceleration hardware.
Improved model quantization for int16x8 support designed for high-fidelity audio and speech processing.

Key Features

Playground for iteration Swap datasets, models, tasks, and training recipes via YAML, CLI, or Python.
End-to-end workflow Data prep, feature extraction, training, evaluation, and export in one place.
Embedded-first optimization Quantization-aware workflows, streaming inference patterns, and memory-conscious models.
Hardware acceleration ready HeliaRT kernels and NeuralSPOT integration for Apollo-class MCUs.
Open and extensible Add new datasets, tasks, and architectures without rewriting the pipeline.

Supported Tasks

Speech Enhancement (SE) Denoising and dereverberation for clearer speech in noisy environments.
Keyword Spotting (KWS) Fast, low-footprint wake word detection using TFLite/TFLM-compatible models.
Voice Activity Detection (VAD) Lightweight, real-time detection of speech presence to save power and reduce false triggers.
Speaker Verification (ID) On-device speaker ID for secure, private voice authentication—no cloud needed.

Development Flow

Prototype on PC Iterate quickly with real-time input and consistent configs.
Train and evaluate Run data prep, training, and benchmarking from the same CLI workflow.
Export and demo Generate deployable artifacts and validate on PC or EVB with matching settings.

Why soundKIT?

Made for iteration: Mix datasets, tasks, and architectures without rewriting the pipeline.
Embedded-First: Validated on real hardware.
Modular Design: Plug-and-play components for data, models, and tasks.
Ultra-Low Power: Ideal for wearables and battery-powered applications.

Use Cases

Smart home, wearables, and smart glasses
Industrial and facilities monitoring
Automotive in-cabin voice control and noise suppression
Healthcare and wellness devices
Security and public safety screening
Consumer audio accessories and conferencing

Modes

soundKIT provides a consistent CLI with task-level modes:

data: dataset setup and preparation
train: model training workflows
evaluate: benchmarking, metrics, and validation
export: deployment artifact generation
demo: real-time inference on PC or EVB

Datasets

soundKIT includes a flexible dataset factory that supports common speech, noise, and reverb corpora (for example LibriSpeech, THCHS-30, MUSAN, FSD50K, ESC-50, RIRS_NOISES). Datasets are not redistributed; users download and use them under their respective licenses.

Get Started

Clone the repo and follow QuickStart to set up.
Run real-time inference on PC.
Validate on Apollo5 with the same configs.

Web Dashboards

To visualize data from your Apollo5 EVB via WebUSB, use the dashboards: