Skip to content

Quickstart Guide

Install soundKIT

Syntax

git clone https://github.com/AmbiqAI/soundkit.git
cd soundkit
./install.sh
source .venv/bin/activate # start the soundkit on virtural env

Requirements

  • Python 3.11^+

Optional (for EVB demo support):


Use soundKIT with CLI

soundKIT provides a unified CLI for handling various ML tasks.

Syntax

soundkit --task [TASK] --mode [MODE] --config [CONFIG]
  • TASK One of: se, vad, kws, id
  • MODE One of: data, train, evaluate, export, demo
  • CONFIG Path to your YAML config

Example: Speech Enhancement (SE) Workflow

Common CLI Usage

soundkit -t se -m data -c configs/se/se.yaml
soundkit -t se -m train -c configs/se/se.yaml

Open TensorBoard in another terminal:

soundkit -t se -m train --tensorboard -c configs/se/se.yaml

Visit http://localhost:6006

soundkit -t se -m evaluate -c configs/se/se.yaml
soundkit -t se -m export -c configs/se/se.yaml
soundkit -t se -m demo -c configs/se/se.yaml demo.platform=evb # for amibiq evb deployment
soundkit -t se -m demo -c configs/se/se.yaml demo.platform=pc # for pc deployment

Configuration Parameters (Simplified)

Understand key settings in your soundKIT YAML config for SE tasks:

Top-Level

  • name: Name of the experiment (used in folder names)
  • project: Task type, e.g., se, kws, vad, id
  • job_dir: Where outputs (models, logs) are saved

Data (data)

  • path_tfrecord: Where TFRecords are stored
  • corpora: List of datasets (type: speech, noise, reverb)
  • snr_dbs: List of SNR values for noise mixing (e.g., [0, 5, 10])
  • target_length_in_secs: Length of each audio clip (e.g., 5)
  • reverb_prob: Probability to apply reverb
  • min_amp/max_amp: Controls audio amplitude range
  • signal.sampling_rate: Sampling rate (e.g., 16000)

Training (train)

  • initial_lr: Learning rate
  • batchsize: Batch size
  • epochs: Total number of epochs
  • loss_function: Type of loss and its parameters (e.g., mrl_mse)
  • feature: Feature extraction settings (e.g., type, frame_size)
  • model.config_file: NN Model architecture YAML (e.g., config_crnn.yaml)

Evaluation (evaluate)

  • data.dir: Path to evaluation audio samples
  • data.files: List of test audio files
  • result_folder: Where results are saved

Export (export)

  • tflite_dir: Exported model path (TFLite format)
  • epoch_loaded: Which model checkpoint to export

Demo (demo)

  • epoch_loaded: Which model checkpoint to export
  • platform: pc or evb (Evaluation Board)
  • evb_dir: Output directory for EVB firmware
  • param_struct_name: Struct name for exported parameters

Overriding Config Values via OmegaConf

soundKIT uses OmegaConf for configuration management. You can override any value in the config file directly from the CLI using key=value syntax (dot notation).

Example: Change platform to evb at runtime

soundkit -t se -m demo -c configs/se.yaml demo.platform=evb

Example: Override training batch size

soundkit -t se -m train -c configs/se.yaml train.batchsize=64

To polish the Model Zoo section, I’ve refined the structure to be more professional, added descriptive subheadings, and used clear formatting to distinguish between the various tasks and model variants.


🦁 Model Zoo

The Model Zoo contains pre-trained, high-performance models ready for immediate evaluation. You can run these on your local PC to verify performance before deploying to hardware.

Switching to Hardware

To deploy any of these models to an Ambiq Apollo EVB, simply append demo.platform=evb to the command.

🎧 Sound Enhancement (SE)

Clean noisy audio streams using state-of-the-art architectures.

  • CRNN Model (Balanced performance/efficiency):

Common CLI Usage

soundkit -t se -m demo -c zoo/se/crnn/se.yaml demo.platform=pc

soundkit -t se -m demo -c zoo/se/crnn/se.yaml demo.platform=evb
go to https://ambiqai.github.io/soundkit/web-dashboards/nnse-usb/


  • UNet Model (High-fidelity enhancement):

Common CLI Usage

soundkit -t se -m demo -c zoo/se/unet/se.yaml demo.platform=pc

soundkit -t se -m demo -c zoo/se/unet/se.yaml demo.platform=evb
go to https://ambiqai.github.io/soundkit/web-dashboards/nnse-usb/


🗣️ Voice Activity Detection (VAD)

Detect the presence of human speech in diverse environments.

  • Frequency Domain (Robust to stationary noise):

Common CLI Usage

soundkit -t vad -m demo -c zoo/vad/freq_model/vad.yaml demo.platform=pc

soundkit -t vad -m demo -c zoo/vad/freq_model/vad.yaml demo.platform=evb
go to https://ambiqai.github.io/soundkit/web-dashboards/sd-usb/


  • Time Domain (End-to-end efficiency):

Common CLI Usage

soundkit -t vad -m demo -c zoo/vad/time_model/vad.yaml demo.platform=pc

soundkit -t vad -m demo -c zoo/vad/time_model/vad.yaml demo.platform=evb
go to https://ambiqai.github.io/soundkit/web-dashboards/sd-usb/


🆔 Speaker Identification (ID)

Secure voice-biometrics and speaker classification.

Common CLI Usage

soundkit -t id -m demo -c zoo/id/id.yaml demo.platform=pc

soundkit -t id -m demo -c zoo/id/id.yaml demo.platform=evb
go to https://ambiqai.github.io/soundkit/web-dashboards/id-usb/


🔑 Keyword Spotting (KWS)

Low-latency wake-word and command recognition.

Common CLI Usage

soundkit -t kws -m demo -c zoo/kws/kws.yaml demo.platform=pc

soundkit -t kws -m demo -c zoo/kws/kws.yaml demo.platform=evb
go to https://ambiqai.github.io/soundkit/web-dashboards/sd-usb/