Quickstart Guide
Install soundKIT
Syntax
git clone https://github.com/AmbiqAI/soundkit.git
cd soundkit
./install.sh
source .venv/bin/activate # start the soundkit on virtural env
Requirements
- Python 3.11^+
Optional (for EVB demo support):
- Arm GNU Toolchain 12.2
- Segger J-Link 7.92
Use soundKIT with CLI
soundKIT provides a unified CLI for handling various ML tasks.
Syntax
soundkit --task [TASK] --mode [MODE] --config [CONFIG]
- TASK One of:
se,vad,kws,id - MODE One of:
data,train,evaluate,export,demo - CONFIG Path to your YAML config
Example: Speech Enhancement (SE) Workflow
Common CLI Usage
soundkit -t se -m data -c configs/se/se.yaml
soundkit -t se -m train -c configs/se/se.yaml
Open TensorBoard in another terminal:
soundkit -t se -m train --tensorboard -c configs/se/se.yaml
Visit http://localhost:6006
soundkit -t se -m evaluate -c configs/se/se.yaml
soundkit -t se -m export -c configs/se/se.yaml
soundkit -t se -m demo -c configs/se/se.yaml demo.platform=evb # for amibiq evb deployment
soundkit -t se -m demo -c configs/se/se.yaml demo.platform=pc # for pc deployment
Configuration Parameters (Simplified)
Understand key settings in your soundKIT YAML config for SE tasks:
Top-Level
name: Name of the experiment (used in folder names)project: Task type, e.g.,se,kws,vad,idjob_dir: Where outputs (models, logs) are saved
Data (data)
path_tfrecord: Where TFRecords are storedcorpora: List of datasets (type:speech,noise,reverb)snr_dbs: List of SNR values for noise mixing (e.g.,[0, 5, 10])target_length_in_secs: Length of each audio clip (e.g.,5)reverb_prob: Probability to apply reverbmin_amp/max_amp: Controls audio amplitude rangesignal.sampling_rate: Sampling rate (e.g.,16000)
Training (train)
initial_lr: Learning ratebatchsize: Batch sizeepochs: Total number of epochsloss_function: Type of loss and its parameters (e.g.,mrl_mse)feature: Feature extraction settings (e.g.,type,frame_size)model.config_file: NN Model architecture YAML (e.g.,config_crnn.yaml)
Evaluation (evaluate)
data.dir: Path to evaluation audio samplesdata.files: List of test audio filesresult_folder: Where results are saved
Export (export)
tflite_dir: Exported model path (TFLite format)epoch_loaded: Which model checkpoint to export
Demo (demo)
epoch_loaded: Which model checkpoint to exportplatform:pcorevb(Evaluation Board)evb_dir: Output directory for EVB firmwareparam_struct_name: Struct name for exported parameters
Overriding Config Values via OmegaConf
soundKIT uses OmegaConf for configuration management. You can override any value in the config file directly from the CLI using key=value syntax (dot notation).
Example: Change platform to evb at runtime
soundkit -t se -m demo -c configs/se.yaml demo.platform=evb
Example: Override training batch size
soundkit -t se -m train -c configs/se.yaml train.batchsize=64
To polish the Model Zoo section, I’ve refined the structure to be more professional, added descriptive subheadings, and used clear formatting to distinguish between the various tasks and model variants.
🦁 Model Zoo
The Model Zoo contains pre-trained, high-performance models ready for immediate evaluation. You can run these on your local PC to verify performance before deploying to hardware.
Switching to Hardware
To deploy any of these models to an Ambiq Apollo EVB, simply append demo.platform=evb to the command.
🎧 Sound Enhancement (SE)
Clean noisy audio streams using state-of-the-art architectures.
- CRNN Model (Balanced performance/efficiency):
Common CLI Usage
soundkit -t se -m demo -c zoo/se/crnn/se.yaml demo.platform=pc
soundkit -t se -m demo -c zoo/se/crnn/se.yaml demo.platform=evb
- UNet Model (High-fidelity enhancement):
Common CLI Usage
soundkit -t se -m demo -c zoo/se/unet/se.yaml demo.platform=pc
soundkit -t se -m demo -c zoo/se/unet/se.yaml demo.platform=evb
🗣️ Voice Activity Detection (VAD)
Detect the presence of human speech in diverse environments.
- Frequency Domain (Robust to stationary noise):
Common CLI Usage
soundkit -t vad -m demo -c zoo/vad/freq_model/vad.yaml demo.platform=pc
soundkit -t vad -m demo -c zoo/vad/freq_model/vad.yaml demo.platform=evb
- Time Domain (End-to-end efficiency):
Common CLI Usage
soundkit -t vad -m demo -c zoo/vad/time_model/vad.yaml demo.platform=pc
soundkit -t vad -m demo -c zoo/vad/time_model/vad.yaml demo.platform=evb
🆔 Speaker Identification (ID)
Secure voice-biometrics and speaker classification.
Common CLI Usage
soundkit -t id -m demo -c zoo/id/id.yaml demo.platform=pc
soundkit -t id -m demo -c zoo/id/id.yaml demo.platform=evb
🔑 Keyword Spotting (KWS)
Low-latency wake-word and command recognition.
Common CLI Usage
soundkit -t kws -m demo -c zoo/kws/kws.yaml demo.platform=pc
soundkit -t kws -m demo -c zoo/kws/kws.yaml demo.platform=evb