Skip to content

๐Ÿ“ Data Preparation

This page explains how to prepare training, validation, and test datasets for Speech Enhancement (SE) using the soundkit CLI.

The dataset preparation process mixes clean speech with noise (and optional reverb), applies SNR scaling and amplitude augmentation, and saves the synthesized examples into TFRecords for training.


๐Ÿ”ง Run data Mode

soundkit -t se -m data -c configs/se/se.yaml

๐Ÿงพ Data Parameters

Parameter Description
path_tfrecord Output directory to store generated TFRecords.
tfrecord_datalist_name CSV file listing TFRecord shards for training and validation. The file is saved unter the directory of path_threcord
num_samples_per_noise Number of samples for clean speeches generated per type of noise for train and val splits
force_download If true, forces re-download of corpora
reverb_prob Probability of applying room reverb using impulse responses
num_processes Number of parallel processes used for synthesis
corpora List of dataset definitions for training and evaluation. Each entry specifies name (dataset name, must match a loader in soundKIT), type (speech, noise, or reverb), and split (train, val, or train-val). Default names and types are provided in soundkit.defines.
snr_dbs List of SNRs (in dB) for mixing clean speech with noise
target_length_in_secs Duration of each synthesized example (in seconds)
min_amp, max_amp Amplitude scaling range used to randomly scale synthesized signals
debug If true, enables additional logging for debugging
signal signal.sampling_rate: target sampling rate; dc_remove: applying dc removal for your training examples

๐Ÿ“ฆ How Corpora Are Defined

soundKIT uses the corpora field in YAML config files to specify the datasets to be used during training and evaluation. Each dataset is defined by:

  • name: The registered name of the dataset (must match a loader function)

  • type: One of speech, noise, or reverb

  • split: Defines which parts of the dataset to use (train, val, or train-val)

๐Ÿ”ง Default Corpora

Below is a list of default corpora supported by soundKIT. You can find detailed descriptions in the Corpora documentation Corpora:

corpora:
    - {name: train-clean-360, type: speech, split: train}
    - {name: train-clean-100, type: speech, split: train}
    - {name: dev-clean, type: speech, split: val,}
    - {name: thchs30, type: speech, split: train-val}
    - {name: ESC-50-master, type: noise, split: train-val}
    - {name: FSD50K, type: noise, split: train-val}
    - {name: musan, type: noise, split: train-val}
    - {name: wham_noise, type: noise, split: train-val}
    - {name: rirs_noises, type: reverb, split: train-val}

๐Ÿงฉ Custom Datasets

Want to use your own data? soundKIT makes it easy to register your own speech, noise, or reverb datasets. See the BYOD guide for details.


๐Ÿงช Output

Running the data mode will generate:

  • TFRecord files (e.g., train-00001.tfrecord) at ./soundkit/tasks/se/tfrecords
  • CSV index files (train_tfrecord.csv, val_tfrecord.csv) referencing TFRecord shards