Speech Enhancement (SE)
The Speech Enhancement (SE) module in soundKIT enables denoising of speech signals for real-time and embedded applications. It is designed for both research and deployment, supporting:
- ✅ Data preparation
- ✅ Model training
- ✅ Evaluation
- ✅ Model export
- ✅ Real-time inference (demo)
This module is optimized for deployment on Ambiq's family of ultra-low power SoCs, enabling efficient and low-latency speech enhancement on edge devices.
📘 Try it now: Explore the SE Tutorial Notebook for a hands-on walkthrough.
Features
- Noise suppression for clean speech recovery
- Real-time frame-by-frame inference
- Modular support for CRNN and UNet architectures
- Export for embedded deployment (TFLite, CMSIS, etc.)
- Demo on Ambiq's family of ultra-low power SoCs via WebUSB
Install soundKIT
Follow the instructions in the QuickStart to set up your environment.
SE Task Modes
The soundkit CLI provides multiple modes for running the SE task. All modes are configured through a YAML file (e.g., se.yaml). Below is a breakdown of the configuration structure and CLI commands.
se.yaml
name: unet_experiment
project: se
job_dir: ./soundkit/tasks/s
data:
path_tfrecord: ${job_dir}/tfrecords
tfrecord_datalist_name: # list of saved tfrecords
train: train_tfrecord.csv
val: val_tfrecord.csv
num_samples_per_noise:
train: 1000
val: 250
force_download: false
reverb_prob: 0.5
num_processes: 8
corpora:
- {name: train-clean-360, type: speech, split: train}
- {name: train-clean-100, type: speech, split: train}
- {name: dev-clean, type: speech, split: val,}
- {name: thchs30, type: speech, split: train-val}
- {name: ESC-50-master, type: noise, split: train-val}
- {name: FSD50K, type: noise, split: train-val}
- {name: musan, type: noise, split: train-val}
- {name: wham_noise, type: noise, split: train-val}
- {name: rirs_noises, type: reverb, split: train-val}
snr_dbs: [-6, -3, 0, 3, 6, 9, 12, 15, 30] # mixture of signal-to-noise ratios
target_length_in_secs: 5
min_amp: 0.03
max_amp: 0.95
signal:
sampling_rate: 16000
dc_removal: true
debug: false
train:
initial_lr: 4e-4
batchsize: 32
epochs: 150
warmup_epochs: 5
epoch_loaded: random
loss_function: {
type: compressed_mse,
params: {exp: 0.6, eps: 1e-8}
}
path:
full_name: ${name}_unit64_la${train.num_lookahead}_dropout0.2_${train.feature.type}_feat
model_dir: ${job_dir}/models_trained/${train.path.full_name}
tensorboard_dir: ${job_dir}/tensorboard/${train.path.full_name}
num_lookahead: 2
feature:
frame_size: 480
hop_size: 160
fft_size: 512
type: logpspec
bins: 257
# type: hybrid
# bins_fft: 100
# n_mels: 72
standardization: true
model:
config_dir: ./soundkit/models/arch_configs
config_file: config_unet.yaml
debug: false
evaluate:
epoch_loaded: best
data:
dir: "./wavs/se/test_wavs"
files: [keyboard_steak.wav, i_like_steak.wav, steak_hairdryer.wav]
# # dir: ./wavs/LibriSpeech/test-clean
# # files:
result_folder: ${job_dir}/test_results/${train.path.full_name}
export:
epoch_loaded: best
tflite_dir: ${job_dir}/tflite
demo:
platform: pc
epoch_loaded: best
tflite_dir: ${job_dir}/tflite
evb_dir: ${job_dir}/evb
pre_gain: 1
SE Task Mode Selection
Download and prepare the training and validation data by generating TFRecords from raw audio corpora.
soundkit -t se -m data -c configs/se/se.yaml
Train the speech enhancement model using the specified configuration and dataset.
soundkit -t se -m train -c configs/se/se.yaml
To monitor training progress in real-time, open a new terminal and launch TensorBoard:
soundkit -t se -m train --tensorboard -c configs/se/se.yaml
See Train in detail.
Evaluate the model on a test set and compute metrics such as SI-SDR, STOI, PESQ, or DNSMOS.
soundkit -t se -m evaluate -c configs/se/se.yaml
Convert the trained model into formats suitable for embedded or web deployment (e.g., TFLite, C arrays).
soundkit -t se -m export -c configs/se/se.yaml
Run real-time inference either on: - On your PC - A connected embedded development board (EVB) via WebUSB We suggest to use PC first for fast testing.
soundkit -t se -m demo -c configs/se/se.yaml demo.platform=pc
# or
soundkit -t se -m demo -c configs/se/se.yaml demo.platform=evb