๐ Data Preparation
This page explains how to prepare training, validation, and test datasets for Speech Enhancement (SE) using the soundkit CLI.
The dataset preparation process mixes clean speech with noise (and optional reverb), applies SNR scaling and amplitude augmentation, and saves the synthesized examples into TFRecords for training.
๐ง Run data Mode
soundkit -t se -m data -c configs/se/se.yaml
๐งพ Data Parameters
| Parameter | Description |
|---|---|
path_tfrecord |
Output directory to store generated TFRecords. |
tfrecord_datalist_name |
CSV file listing TFRecord shards for training and validation. The file is saved unter the directory of path_threcord |
num_samples_per_noise |
Number of samples for clean speeches generated per type of noise for train and val splits |
force_download |
If true, forces re-download of corpora |
reverb_prob |
Probability of applying room reverb using impulse responses |
num_processes |
Number of parallel processes used for synthesis |
corpora |
List of dataset definitions for training and evaluation. Each entry specifies name (dataset name, must match a loader in soundKIT), type (speech, noise, or reverb), and split (train, val, or train-val). Default names and types are provided in soundkit.defines. |
snr_dbs |
List of SNRs (in dB) for mixing clean speech with noise |
target_length_in_secs |
Duration of each synthesized example (in seconds) |
min_amp, max_amp |
Amplitude scaling range used to randomly scale synthesized signals |
debug |
If true, enables additional logging for debugging |
signal |
signal.sampling_rate: target sampling rate; dc_remove: applying dc removal for your training examples |
๐ฆ How Corpora Are Defined
soundKIT uses the corpora field in YAML config files to specify the datasets to be used during training and evaluation. Each dataset is defined by:
-
name: The registered name of the dataset (must match a loader function)
-
type: One of speech, noise, or reverb
-
split: Defines which parts of the dataset to use (train, val, or train-val)
๐ง Default Corpora
Below is a list of default corpora supported by soundKIT. You can find detailed descriptions in the Corpora documentation Corpora:
corpora:
- {name: train-clean-360, type: speech, split: train}
- {name: train-clean-100, type: speech, split: train}
- {name: dev-clean, type: speech, split: val,}
- {name: thchs30, type: speech, split: train-val}
- {name: ESC-50-master, type: noise, split: train-val}
- {name: FSD50K, type: noise, split: train-val}
- {name: musan, type: noise, split: train-val}
- {name: wham_noise, type: noise, split: train-val}
- {name: rirs_noises, type: reverb, split: train-val}
๐งฉ Custom Datasets
Want to use your own data? soundKIT makes it easy to register your own speech, noise, or reverb datasets. See the BYOD guide for details.
๐งช Output
Running the data mode will generate:
- TFRecord files (e.g.,
train-00001.tfrecord) at./soundkit/tasks/se/tfrecords - CSV index files (
train_tfrecord.csv,val_tfrecord.csv) referencing TFRecord shards