Model Evaluation
Introduction
Evaluate mode is used to test the performance of the model on the reserved test set for the specified task. Similar to training, the routine can be customized via CLI configuration file or by setting the parameters directly in the code. The evaluation process involves testing the model's performance on the test data to measure its accuracy, precision, recall, and F1 score. A number of results and metrics will be generated and saved to the job_dir
.
- Load the configuration data (e.g.
configuration.json
(1)) - Load features (e.g.
FS-W-A-5
) - Load the trained model (e.g.
model.keras
) - Define the metrics (e.g.
accuracy
) - Evaluate the model (e.g.
model.evaluate
) - Generate evaluation report (e.g.
report.json
)
- Example configuration:
{ "name": "sd-2-tcn-sm", "job_dir": "./results/sd-2-tcn-sm", "verbose": 2, "datasets": [{ "name": "cmidss", "params": { "path": "./datasets/cmidss" } }], "feature": { "name": "FS-W-A-5", "sampling_rate": 0.2, "frame_size": 12, "loader": "hdf5", "feat_key": "features", "label_key": "detect_labels", "mask_key": "mask", "feat_cols": null, "save_path": "./datasets/store/fs-w-a-5-60", "params": {} }, "sampling_rate": 0.0083333, "frame_size": 240, "num_classes": 2, "class_map": { "0": 0, "1": 1, "2": 1, "3": 1, "4": 1, "5": 1 }, "class_names": ["WAKE", "SLEEP"], "samples_per_subject": 100, "val_samples_per_subject": 100, "test_samples_per_subject": 50, "val_size": 4000, "test_size": 2500, "val_subjects": 0.20, "batch_size": 128, "buffer_size": 10000, "epochs": 200, "steps_per_epoch": 25, "val_steps_per_epoch": 25, "val_metric": "loss", "lr_rate": 1e-3, "lr_cycles": 1, "label_smoothing": 0, "test_metric": "f1", "test_metric_threshold": 0.02, "tflm_var_name": "sk_detect_flatbuffer", "tflm_file": "sk_detect_flatbuffer.h", "backend": "pc", "display_report": true, "quantization": { "qat": false, "mode": "INT8", "io_type": "int8", "concrete": true, "debug": false }, "model_file": "model.keras", "use_logits": false, "architecture": { "name": "tcn", "params": { "input_kernel": [1, 5], "input_norm": "batch", "blocks": [ {"depth": 1, "branch": 1, "filters": 16, "kernel": [1, 5], "dilation": [1, 1], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"}, {"depth": 1, "branch": 1, "filters": 32, "kernel": [1, 5], "dilation": [1, 2], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"}, {"depth": 1, "branch": 1, "filters": 48, "kernel": [1, 5], "dilation": [1, 4], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"}, {"depth": 1, "branch": 1, "filters": 64, "kernel": [1, 5], "dilation": [1, 8], "dropout": 0.10, "ex_ratio": 1, "se_ratio": 4, "norm": "batch"} ], "output_kernel": [1, 5], "include_top": true, "use_logits": true, "model_name": "tcn" } } }
graph LR
A("`Load
configuration
__TaskParams__
`")
B("`Load
features
__FeatureFactory__
`")
C("`Load trained
__model__
`")
D("`Define
__metrics__
`")
E("`Evaluate
__model__
`")
F("`Generate
__report__
`")
A ==> B
B ==> C
subgraph CF["Evaluate"]
C ==> D
D ==> E
end
E ==> F
Usage
CLI
The following command will evaluate a detect model using the reference configuration.
Python
The model can be evaluated using the following snippet:
- Example configuration:
Arguments
Please refer to TaskParams for the list of arguments that can be used with the evaluate
command.