Model Training
Introduction
Each task provides a mode to train a model on the specified features. The training mode can be invoked either via CLI or within sleepkit
python package. At a high level, the training mode performs the following actions based on the provided configuration parameters:
- Load the configuration data (e.g.
configuration.json
(1)) - Load features (e.g.
FS-W-A-5
) - Initialize custom model architecture (e.g.
tcn
) - Define the metrics, loss, and optimizer (e.g.
accuracy
,categorical_crossentropy
,adam
) - Train the model (e.g.
model.fit
) - Save artifacts (e.g.
model.keras
)
- Example configuration:
{ "name": "sd-2-tcn-sm", "job_dir": "./results/sd-2-tcn-sm", "verbose": 2, "datasets": [{ "name": "cmidss", "params": { "path": "./datasets/cmidss" } }], "feature": { "name": "FS-W-A-5", "sampling_rate": 0.2, "frame_size": 12, "loader": "hdf5", "feat_key": "features", "label_key": "detect_labels", "mask_key": "mask", "feat_cols": null, "save_path": "./datasets/store/fs-w-a-5-60", "params": {} }, "sampling_rate": 0.0083333, "frame_size": 240, "num_classes": 2, "class_map": { "0": 0, "1": 1, "2": 1, "3": 1, "4": 1, "5": 1 }, "class_names": ["WAKE", "SLEEP"], "samples_per_subject": 100, "val_samples_per_subject": 100, "test_samples_per_subject": 50, "val_size": 4000, "test_size": 2500, "val_subjects": 0.20, "batch_size": 128, "buffer_size": 10000, "epochs": 200, "steps_per_epoch": 25, "val_steps_per_epoch": 25, "val_metric": "loss", "lr_rate": 1e-3, "lr_cycles": 1, "label_smoothing": 0, "test_metric": "f1", "test_metric_threshold": 0.02, "tflm_var_name": "sk_detect_flatbuffer", "tflm_file": "sk_detect_flatbuffer.h", "backend": "pc", "display_report": true, "quantization": { "qat": false, "mode": "INT8", "io_type": "int8", "concrete": true, "debug": false }, "model_file": "model.keras", "use_logits": false, "architecture": { "name": "tcn", "params": { "input_kernel": [1, 5], "input_norm": "batch", "blocks": [ {"depth": 1, "branch": 1, "filters": 16, "kernel": [1, 5], "dilation": [1, 1], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"}, {"depth": 1, "branch": 1, "filters": 32, "kernel": [1, 5], "dilation": [1, 2], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"}, {"depth": 1, "branch": 1, "filters": 48, "kernel": [1, 5], "dilation": [1, 4], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"}, {"depth": 1, "branch": 1, "filters": 64, "kernel": [1, 5], "dilation": [1, 8], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"} ], "output_kernel": [1, 5], "include_top": true, "use_logits": true, "model_name": "tcn" } } }
graph LR
A("`Load
configuration
__TaskParams__
`")
B("`Load
features
__FeatureFactory__
`")
C("`Initialize
model
__ModelFactory__
`")
D("`Define
_metrics_, _loss_,
_optimizer_
`")
E("`Train
__model__
`")
F("`Save
__artifacts__
`")
A ==> B
B ==> C
subgraph "Model Training"
C ==> D
D ==> E
end
E ==> F
Usage
CLI
The following command will train a sleep stage model using the reference configuration.
Python
The model can be trained using the following snippet:
- Example configuration:
Arguments
Please refer to TaskParams for the list of arguments that can be used with the train
command.
Logging
SleepKit provides built-in support for logging to several third-party services including Weights & Biases (WANDB) and TensorBoard.
WANDB
The training mode is able to log all metrics and artifacts (aka models) to Weights & Biases (WANDB). To enable WANDB logging, simply set environment variable WANDB=1
. Remember to sign in prior to running experiments by running wandb login
.
TensorBoard
The training mode is able to log all metrics to TensorBoard. To enable TensorBoard logging, simply set environment variable TENSORBOARD=1
.