🧠 soundKIT Speech Enhancement Tutorial¶

This tutorial walks you through using the soundKIT CLI for speech enhancement tasks including data preparation, training, evaluation, exporting to TFLite, and deployment.

⚙️ Setup¶

In [2]:

Copied!





# Set working directory to project root (one level up)
import os
os.chdir("..")
print("✅ Working directory set to project root:", os.getcwd())
# Set working directory to project root (one level up)
import os
os.chdir("..")
print("✅ Working directory set to project root:", os.getcwd())

✅ Working directory set to project root: /home/poyupaulchen/Documents/soundkit

In [ ]:

Copied!





# Clone the soundKIT repo and navigate to it
!git clone https://github.com/AmbiqAI/soundkit.git
%cd soundkit

# Install required dependencies
!pip install -e .
# Clone the soundKIT repo and navigate to it
!git clone https://github.com/AmbiqAI/soundkit.git
%cd soundkit

# Install required dependencies
!pip install -e .

🔧 Configuration¶

In [3]:

Copied!





# Show editable training config
config_path = "configs/se.yaml"

# Optionally print the config content
with open(config_path, "r") as f:
    print(f.read())
# Show editable training config
config_path = "configs/se.yaml"

# Optionally print the config content
with open(config_path, "r") as f:
    print(f.read())

name: crnn_experiment
project: se
job_dir: ./soundkit/tasks/se

data:
  path_tfrecord: ${job_dir}/tfrecords
  tfrecord_datalist_name: # list of saved tfrecords
    train: train_tfrecord.csv 
    val: val_tfrecord.csv
  num_samples_per_noise:
    train: 1000
    val: 250
  force_download: false
  reverb_prob: 0.2
  num_processes: 8
  corpora:
    - {name: train-clean-360, type: train, path: wavs/LibriSpeech/train-clean-360}
    - {name: train-clean-100, type: train, path: wavs/LibriSpeech/train-clean-100}
    - {name: dev-clean, type: val, path: wavs/LibriSpeech/dev-clean}
    - {name: thchs30, type: train-val, path: {train: wavs/data_thchs30/train, dev: wavs/data_thchs30/dev}}
    - {name: ESC-50-master, type: noise}
    - {name: FSD50K, type: noise}
    - {name: musan, type: noise}
    - {name: wham_noise, type: noise}
    - {name: rirs_noises, type: reverb}
  snr_dbs: [-6, -3, 0, 3, 6, 9, 12, 15, 30] # mixture of signal-to-noise ratios
  target_length_in_secs: 5
  min_amp: 0.1
  max_amp: 0.95

  signal:
    sampling_rate: 16000
    dc_removal: true
  debug: false

train:
  initial_lr: 4e-4
  batchsize: 32
  epochs: 150
  warmup_epochs: 5
  epoch_loaded: random
  loss_function: {
    type: mse,
    params: {exp: 0.6, eps: 1e-8}
    }
  path:
    model_dir: ${job_dir}/models_trained/${name}
    tensorboard_dir: ${job_dir}/tensorboard/${name}
  num_lookahead: 0
  
  feature:
    frame_size: 480
    hop_size: 160
    fft_size: 512
    type: mel
    bins: 72

  standardization: true
  
  model:
    config_dir: ./soundkit/models/arch_configs
    config_file: config_crnn.yaml
  
  debug: false

evaluate:
  epoch_loaded: best

  data: 
    dir: "./wavs/se/test_wavs"
    files: [keyboard_steak.wav, i_like_steak.wav, steak_hairdryer.wav]
    # # dir: ./wavs/LibriSpeech/test-clean
    # # files:
    result_folder: ${job_dir}/test_results/${name}

export:
  epoch_loaded: best
  tflite_dir: ${job_dir}/tflite

demo:
  epoch_loaded: best
  tflite_dir: ${job_dir}/tflite
  evb_dir: ${job_dir}/evb
  pre_gain: 1

📁 Step 1: Prepare Data¶

In [4]:

Copied!

!soundkit -t se -m data -c configs/se.yaml
!soundkit -t se -m data -c configs/se.yaml

2025-05-25 01:16:00.130646: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-25 01:16:00.130671: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-25 01:16:00.131318: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-25 01:16:00.135195: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-25 01:16:00.690089: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-05-25 01:16:01.286619: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.324943: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.325075: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.325955: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.326050: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.326250: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.378412: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.378542: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.378630: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:16:01.378694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 19180 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
Mode: data, Task: se
Processing [train] set with [ESC-50-master] noise
Processing [train] set with [FSD50K] noise                                      
Processing [train] set with [musan] noise                                       
Processing [train] set with [wham_noise] noise                                  
Processing [val] set with [ESC-50-master] noise                                 
Processing [val] set with [FSD50K] noise                                        
Processing [val] set with [musan] noise                                         
Processing [val] set with [wham_noise] noise

🎯 Step 2: Train Model¶

In [7]:

Copied!

!soundkit -t se -m train -c configs/se.yaml
!soundkit -t se -m train -c configs/se.yaml

2025-05-25 01:18:28.018874: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-25 01:18:28.018901: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-25 01:18:28.019559: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-25 01:18:28.023413: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-25 01:18:28.580924: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-05-25 01:18:29.155739: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.193461: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.193582: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.194454: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.194561: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.194726: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.249032: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.249183: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.249276: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:29.249342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 19203 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
Mode: train, Task: se
Training SE model with params: name='crnn_experiment' project='se' job_dir=PosixPath('soundkit/tasks/se') data={'path_tfrecord': './soundkit/tasks/se/tfrecords', 'tfrecord_datalist_name': {'train': 'train_tfrecord.csv', 'val': 'val_tfrecord.csv'}, 'num_samples_per_noise': {'train': 1000, 'val': 250}, 'force_download': False, 'reverb_prob': 0.2, 'num_processes': 8, 'corpora': [{'name': 'train-clean-360', 'type': 'train', 'path': 'wavs/LibriSpeech/train-clean-360'}, {'name': 'train-clean-100', 'type': 'train', 'path': 'wavs/LibriSpeech/train-clean-100'}, {'name': 'dev-clean', 'type': 'val', 'path': 'wavs/LibriSpeech/dev-clean'}, {'name': 'thchs30', 'type': 'train-val', 'path': {'train': 'wavs/data_thchs30/train', 'dev': 'wavs/data_thchs30/dev'}}, {'name': 'ESC-50-master', 'type': 'noise'}, {'name': 'FSD50K', 'type': 'noise'}, {'name': 'musan', 'type': 'noise'}, {'name': 'wham_noise', 'type': 'noise'}, {'name': 'rirs_noises', 'type': 'reverb'}], 'snr_dbs': [-6, -3, 0, 3, 6, 9, 12, 15, 30], 'target_length_in_secs': 5, 'min_amp': 0.03, 'max_amp': 0.95, 'signal': {'sampling_rate': 16000, 'dc_removal': True}, 'debug': False} train={'initial_lr': 0.0004, 'batchsize': 32, 'epochs': 150, 'warmup_epochs': 5, 'epoch_loaded': 'random', 'loss_function': {'type': 'compressed_mse', 'params': {'exp': 0.6, 'eps': 1e-08}}, 'path': {'model_dir': './soundkit/tasks/se/models_trained/crnn_experiment', 'tensorboard_dir': './soundkit/tasks/se/tensorboard/crnn_experiment'}, 'num_lookahead': 0, 'feature': {'frame_size': 480, 'hop_size': 160, 'fft_size': 512, 'type': 'mel', 'bins': 72}, 'standardization': True, 'model': {'config_dir': './soundkit/models/arch_configs', 'config_file': 'config_crnn.yaml'}, 'debug': False} evaluate={'epoch_loaded': 'best', 'data': {'dir': './wavs/se/test_wavs', 'files': ['keyboard_steak.wav', 'i_like_steak.wav', 'steak_hairdryer.wav'], 'result_folder': './soundkit/tasks/se/test_results/crnn_experiment'}} export={'epoch_loaded': 'best', 'tflite_dir': './soundkit/tasks/se/tflite'} demo={'epoch_loaded': 'best', 'tflite_dir': './soundkit/tasks/se/tflite', 'evb_dir': './soundkit/tasks/se/evb', 'pre_gain': 1} and more
Downloading model weights for crnn_experiment to soundkit/tasks/se
Model: "crnn"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dropout (Dropout)           multiple                  0         
                                                                 
 conv2d (Conv2D)             multiple                  36372     
                                                                 
 dropout_1 (Dropout)         multiple                  0         
                                                                 
 lstm (LSTM)                 multiple                  56784     
                                                                 
 dropout_2 (Dropout)         multiple                  0         
                                                                 
 dense (Dense)               multiple                  7140      
                                                                 
 dropout_3 (Dropout)         multiple                  0         
                                                                 
 dense_1 (Dense)             multiple                  7140      
                                                                 
 dropout_4 (Dropout)         multiple                  0         
                                                                 
 dense_2 (Dense)             multiple                  21845     
                                                                 
=================================================================
Total params: 146177 (571.00 KB)
Trainable params: 129281 (505.00 KB)
Non-trainable params: 16896 (66.00 KB)
_________________________________________________________________
Epoch 0/150

2025-05-25 01:18:34.679417: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8904
2025-05-25 01:18:34.964459: I external/local_xla/xla/service/service.cc:168] XLA service 0x72a65173acf0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2025-05-25 01:18:34.964480: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2025-05-25 01:18:34.968224: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1748161115.037346 1438135 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
  [train] | 125/125 |     Loss: 11.2496 | 

  [val] | 31/31 |     Loss: 11.0736 | 

Epoch 1/150

  [train] | 125/125 |     Loss: 10.1950 | 

  [val] | 31/31 |     Loss: 9.3877 | 

Epoch 2/150

  [train] | 125/125 |     Loss: 8.6472 | 

  [val] | 31/31 |     Loss: 7.8945 | 

Epoch 3/150

  [train] | 125/125 |     Loss: 7.8091 | 

  [val] | 31/31 |     Loss: 7.1544 | 

Epoch 4/150

  [train] | 125/125 |     Loss: 7.3638 | 

  [val] | 31/31 |     Loss: 6.5285 | 

Epoch 5/150

  [train] | 125/125 |     Loss: 7.0209 | 

  [val] | 31/31 |     Loss: 6.3599 | 

Epoch 6/150

  [train] | 125/125 |     Loss: 6.7351 | 

  [val] | 31/31 |     Loss: 6.0633 | 

Epoch 7/150

  [train] | 125/125 |     Loss: 6.5289 | 

  [val] | 31/31 |     Loss: 6.0435 | 

Epoch 8/150

  [train] | 125/125 |     Loss: 6.3136 | 

  [val] | 31/31 |     Loss: 5.8874 | 

Epoch 9/150

  [train] | 125/125 |     Loss: 6.2481 | 

  [val] | 31/31 |     Loss: 5.8651 | 

Epoch 10/150

  [train] | 125/125 |     Loss: 6.0919 | 

  [val] | 31/31 |     Loss: 5.7019 | 

Epoch 11/150

  [train] | 125/125 |     Loss: 5.9949 | 

  [val] | 31/31 |     Loss: 5.8791 | 

Epoch 12/150

  [train] | 125/125 |     Loss: 5.9605 | 

  [val] | 31/31 |     Loss: 5.6243 | 

Epoch 13/150

  [train] | 125/125 |     Loss: 5.8215 | 

  [val] | 31/31 |     Loss: 5.6699 | 

Epoch 14/150

  [train] | 125/125 |     Loss: 5.7624 | 

  [val] | 31/31 |     Loss: 5.5465 | 

Epoch 15/150

  [train] | 125/125 |     Loss: 5.7405 | 

  [val] | 31/31 |     Loss: 5.4573 | 

Epoch 16/150

  [train] | 125/125 |     Loss: 5.6025 | 

  [val] | 31/31 |     Loss: 5.4954 | 

Epoch 17/150

  [train] | 125/125 |     Loss: 5.5708 | 

  [val] | 31/31 |     Loss: 5.2935 | 

Epoch 18/150

  [train] | 125/125 |     Loss: 5.5097 | 

  [val] | 31/31 |     Loss: 5.2901 | 

Epoch 19/150

  [train] | 125/125 |     Loss: 5.4354 | 

  [val] | 31/31 |     Loss: 5.3616 | 

Epoch 20/150

  [train] | 125/125 |     Loss: 5.3939 | 

  [val] | 31/31 |     Loss: 5.3062 | 

Epoch 21/150

  [train] | 125/125 |     Loss: 5.3996 | 

  [val] | 31/31 |     Loss: 5.1070 | 

Epoch 22/150

Traceback (most recent call last):730 | 
  File "/home/poyupaulchen/Documents/soundkit/.venv/bin/soundkit", line 8, in <module>
    sys.exit(main())
  File "/home/poyupaulchen/Documents/soundkit/soundkit/cli.py", line 86, in main
    parser()
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/argdantic/core.py", line 150, in __call__
    return args.__func__(args)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/argdantic/core.py", line 82, in __call__
    return self.callback(**destructured)
  File "/home/poyupaulchen/Documents/soundkit/soundkit/cli.py", line 73, in run
    task_handler.train(params)
  File "/home/poyupaulchen/Documents/soundkit/soundkit/tasks/se/__init__.py", line 35, in train
    train(params)
  File "/home/poyupaulchen/Documents/soundkit/soundkit/tasks/se/train.py", line 297, in train
    loss = run_epoch(
  File "/home/poyupaulchen/Documents/soundkit/soundkit/tasks/se/train.py", line 117, in run_epoch
    feat_sn, spec_sn, states_audio_sn = feat_extractor(
  File "/home/poyupaulchen/Documents/soundkit/soundkit/utils/feature_utils.py", line 70, in __call__
    states_udpate = tf.identity(audio_sn[:, -overlap:])
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/ops/array_ops.py", line 1130, in _slice_helper
    array_ops_stack.stack(begin),
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/ops/array_ops_stack.py", line 74, in stack
    return ops.convert_to_tensor(values, name=name)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
    return func(*args, **kwargs)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 696, in convert_to_tensor
    return tensor_conversion_registry.convert(
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 234, in convert
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 335, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/ops/weak_tensor_ops.py", line 142, in wrapper
    return op(*args, **kwargs)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 284, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 296, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/poyupaulchen/Documents/soundkit/.venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 103, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
KeyboardInterrupt
^C

To monitor training with TensorBoard:

In [6]:

Copied!

!soundkit -t se -m train --tensorboard -c configs/se.yaml
# Then run this in another terminal:
# tensorboard --logdir=soundkit/tasks/se/your_tensorboard_dir
!soundkit -t se -m train --tensorboard -c configs/se.yaml
# Then run this in another terminal:
# tensorboard --logdir=soundkit/tasks/se/your_tensorboard_dir

2025-05-25 01:18:11.307317: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-25 01:18:11.307343: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-25 01:18:11.308028: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-25 01:18:11.311922: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-25 01:18:11.878821: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-05-25 01:18:12.469211: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.506868: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.507001: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.508036: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.508132: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.508214: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.555868: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.555991: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.556082: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:12.556147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 19200 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
Mode: train, Task: se
🚀 Launching TensorBoard at: soundkit/tasks/se/tensorboard
2025-05-25 01:18:15.484963: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-25 01:18:15.484991: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-25 01:18:15.485663: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-25 01:18:15.489396: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-25 01:18:16.060973: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-05-25 01:18:16.596399: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:16.633784: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-05-25 01:18:16.633942: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.15.2 at http://localhost:6007/ (Press CTRL+C to quit)
^C

📊 Step 3: Evaluate Model¶

In [ ]:

Copied!

!soundkit -t se -m evaluate -c configs/se.yaml
!soundkit -t se -m evaluate -c configs/se.yaml

📤 Step 4: Export Model to TFLite¶

In [ ]:

Copied!

!soundkit -t se -m export -c configs/se.yaml
!soundkit -t se -m export -c configs/se.yaml

🚀 Step 5: Deploy to Embedded EVB (e.g., Apollo)¶

In [ ]:

Copied!

!soundkit -t se -m demo -c configs/se.yaml demo.platform=evb # For EVB demo, ensure you have a microphone connected
!soundkit -t se -m demo -c configs/se.yaml demo.platform=pc # For PC demo, ensure you have a microphone connected
!soundkit -t se -m demo -c configs/se.yaml demo.platform=evb # For EVB demo, ensure you have a microphone connected
!soundkit -t se -m demo -c configs/se.yaml demo.platform=pc # For PC demo, ensure you have a microphone connected

2025-06-09 00:12:14.811082: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-06-09 00:12:14.811107: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-06-09 00:12:14.811958: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-09 00:12:14.816316: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-06-09 00:12:15.374533: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-06-09 00:12:15.940246: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:15.976081: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:15.976214: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:15.977362: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:15.977453: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:15.977532: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:16.033370: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:16.033500: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:16.033593: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-06-09 00:12:16.033658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20841 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
🔧 Mode: demo, Task: se
🛠️  Overrides: ['demo.platform=evb']
Traceback (most recent call last):
  File "/home/poyupaulchen/Documents/soundkit/.venv/bin/soundkit", line 8, in <module>
    sys.exit(main())
  File "/home/poyupaulchen/Documents/soundkit/soundkit/cli.py", line 77, in main
    run_task(
  File "/home/poyupaulchen/Documents/soundkit/soundkit/cli.py", line 32, in run_task
    params = parse_config(config, overrides=extra_overrides)
  File "/home/poyupaulchen/Documents/soundkit/soundkit/cli.py", line 17, in parse_config
    raise FileNotFoundError(f"Config file not found: {path}")
FileNotFoundError: Config file not found: configs/se.yaml

✅ Notes¶

Edit se.yaml to customize parameters like sampling rate, model type, loss function
Change training and export paths as needed
Log and spectrograms are saved in their respective folders