Voice Activity Detection (VAD)
The Voice Activity Detection (VAD) module in soundKIT enables robust detection of speech activity in noisy audio environments, suitable for real-time and embedded deployments. It supports:
- ✅ Data preparation
- ✅ Model training
- ✅ Evaluation
- ✅ Model export
- ✅ Real-time inference (demo)
Optimized for edge deployment on Ambiq's ultra-low power SoCs, VAD ensures efficient voice detection even in constrained environments.
📘 Try it now: Explore the VAD Tutorial Notebook to get started.
Features
- Frame-level voice activity prediction
- Real-time processing for embedded or browser-based use
- Modular architecture: use or extend CRNN-based backbones
- TFLite and C-array export for low-power MCUs
- Seamless PC or EVB demo support
Install soundKIT
See the QuickStart Guide to set up your environment.
VAD Task Modes
The soundkit CLI supports multiple modes for running the VAD task. Each is configured via a YAML file (e.g., vad.yaml). Here's an example overview:
vad.yaml
name: crnn_experiment
project: vad
job_dir: ./soundkit/tasks/vad
data:
path_tfrecord: ${job_dir}/tfrecords
tfrecord_datalist_name:
train: train_tfrecord.csv
val: val_tfrecord.csv
num_samples_per_noise:
train: 50000
val: 3590
force_download: false
reverb_prob: 0.2
num_processes: 8
corpora:
- {name: vad_train-clean-100, type: speech, split: train}
- {name: vad_train-clean-360, type: speech, split: train}
- {name: vad_dev-clean, type: speech, split: val}
- {name: vad_thchs30, type: speech, split: train-val}
- {name: ESC-50-master, type: noise, split: train-val}
- {name: FSD50K, type: noise, split: train-val}
- {name: musan, type: noise, split: train-val}
- {name: wham_noise, type: noise, split: train-val}
- {name: rirs_noises, type: reverb, split: train-val}
snr_dbs: [-12, -9, -6, -3, 0, 3, 6, 9, 12, 15, 30]
target_length_in_secs: 5
min_amp: 0.01
max_amp: 0.95
signal:
sampling_rate: 16000
dc_removal: true
debug: false
train:
initial_lr: 4e-4
batchsize: 128
epochs: 150
warmup_epochs: 5
epoch_loaded: random
loss_function:
type: cross_entropy
params: {}
path:
full_name: ${name}_loss-${train.feature.type}_drop-${train.model.override.dropout_rate}_stridetime-${train.model.override.stride_time}_mvn-${train.standardization}_units-${train.model.override.units}_sr-${data.signal.sampling_rate}
checkpoint_dir: ${job_dir}/models_trained/${train.path.full_name}
tensorboard_dir: ${job_dir}/tensorboard/${train.path.full_name}
num_lookahead: 0
feature:
frame_size: 480
hop_size: 160
fft_size: 512
type: logpspec
bins: 257
standardization: true
model:
config_dir: ./soundkit/models/arch_configs
config_file: config_crnn_vad.yaml
override:
units: 22
dropout_rate_input: 0.1
dropout_rate: 0.2
stride_time: 1
len_time: 6
debug: false
evaluate:
epoch_loaded: best
data:
dir: "./wavs/vad/test_wavs"
files: [rpc_audio_raw.wav, speech.wav, i_like_steak.wav, keyboard_steak.wav, steak_hairdryer.wav]
result_folder: ${job_dir}/test_results/${train.path.full_name}
export:
epoch_loaded: best
tflite_dir: ${job_dir}/tflite
demo:
platform: evb
epoch_loaded: best
tflite_dir: ${job_dir}/tflite
evb_dir: ${job_dir}/evb
pre_gain: 1
filename: def_nn1_nnvad
param_struct_name: params_nn1_nnvad
VAD Task Mode Overview
Prepare training and validation examples by mixing speech and noise with controlled SNRs and reverb.
soundkit -t vad -m data -c configs/vad/vad.yaml
Train the VAD model with your prepared dataset and configuration.
soundkit -t vad -m train -c configs/vad/vad.yaml
Start TensorBoard in a separate terminal:
soundkit -t vad -m train --tensorboard -c configs/vad/vad.yaml
See Train for guidance.
Evaluate the model on real recordings to visualize predicted voice activity.
soundkit -t vad -m evaluate -c configs/vad/vad.yaml
Convert model to embedded-friendly formats (TFLite, C headers) for deployment.
soundkit -t vad -m export -c configs/vad/vad.yaml
Test the model in real-time either using PC or EVB hardware. We suggest to run on your PC first and try on EVB later:
soundkit -t vad -m demo -c configs/vad/vad.yaml demo.platform=pc
# or
soundkit -t vad -m demo -c configs/vad/vad.yaml demo.platform=evb