Skip to content

Sleep Stage Models

Model Overview

The following table provides the latest pre-trained models for sleep detection. Below we also provide additional details including training configuration, accuracy metrics, and hardware performance results for the models.

NAME LOCATION # CLASSES MODEL PARAMS FLOPS ACCURACY AP
SS-2-TCN-SM Wrist 2 TCN 10K 1.7M/hr 88.9% 96.0%
SS-3-TCN-SM Wrist 3 TCN 11K 1.9M/hr 84.4% 91.8%
SS-4-TCN-SM Wrist 4 TCN 26K 4.2M/hr 75.5% 81.6%
SS-5-TCN-SM Wrist 5 TCN 28K 4.6M/hr 68.4% 74.1%

Model Details

The SS-2-TCN-SM model is a 2-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify sleep and wake stages. The model requires only 10K parameters and achieves an accuracy of 88.8% and an average AP score of 96.2%.

  • Location: Wrist
  • Classes: Awake, Sleep
  • Frame Size: 2 hours
  • Datasets: MESA
  • Features: FS-W-PA-14
Base Class Target Class Label
0-WAKE 0 WAKE
1-N1, 2-N2, 3-N3, 5-REM 1 SLEEP

The SS-3-TCN-SM model is a 3-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, sleep and rem stages. The model requires only 14K parameters and achieves an accuracy of 84% and an average AP score of 91.5%.

  • Location: Wrist
  • Classes: Awake, NREM, REM
  • Frame Size: 2 hours
  • Datasets: MESA
  • Features: FS-W-PA-14
Base Class Target Class Label
0-WAKE 0 WAKE
1-N1, 2-N2, 3-N3 1 NREM
5-REM 2 REM

The SS-4-TCN-SM model is a 4-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, core, deep, and rem stages. The model requires only 18K parameters and achieves an accuracy of 75.8% and an average AP score of 83.1%.

  • Location: Wrist
  • Classes: Awake, Core, Deep, REM
  • Frame Size: 2 hours
  • Datasets: MESA
  • Features: FS-W-PA-14
Base Class Target Class Label
0-WAKE 0 WAKE
1-N1, 2-N2 1 CORE
3-N3 2 DEEP
5-REM 3 REM

The SS-5-TCN-SM model is a 5-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, n1, n2, n3, and rem stages. The model requires only 17K parameters and achieves an accuracy of 70.4% and an average AP score of 76.4%.

  • Location: Wrist
  • Classes: Awake, N1, N2, N3, REM
  • Frame Size: 2 hours
  • Datasets: MESA
  • Features: FS-W-PA-14
Base Class Target Class Label
0-WAKE 0 WAKE
1-N1 1 N1
2-N2 2 N2
3-N3 3 N3
5-REM 4 REM

Model Performance

Confusion Matrix

Sleep Efficiency Plot

Total Sleep Time (TST) Plot

Confusion Matrix

Confusion Matrix

Confusion Matrix


EVB Performance

The following table provides the latest performance and accuracy results of all models when running on Apollo4 Plus EVB. These results are obtained using neuralSPOTs Autodeploy tool. From neuralSPOT repo, the following command can be used to capture EVB results via Autodeploy:

python -m ns_autodeploy \
--tflite-filename model.tflite \
--model-name model \
--cpu-mode 192 \
--arena-size-scratch-buffer-padding 0 \
--max-arena-size 80 \
Name Params FLOPS Metric Time Arena Energy
SD-2-TCN-SM 10K 1.7M/hr 96.0% AP 66ms/hr 32KB 705uJ/hr
SD-3-TCN-SM 11K 1.9M/hr 91.8% AP 75ms/hr 34KB 710uJ/hr
SD-4-TCN-SM 26K 4.2M/hr 81.6% AP 129ms/hr 47KB 1.23mJ/hr
SD-5-TCN-SM 28K 4.6M/hr 74.1% AP 172ms/hr 50KB 1.65mJ/hr

In addition, we can capture statistics from each layer. The following bar plot provides the latency of each block in the 4-stage sleep classification TCN model. For example, ENC refers to initial encoder 1-d seperable convulional layer, B1.1 refers to all the layers in block 1, depth 1, B1.2 refers to block 1, depth 2, and so on. We can see that as we go deeper into the network we see an increase in latency due to the increasing number of channels. The final DEC layer refers to the decoder layer which is a 1-d convolutional layer with 3 output channels (4 classes).


Comparison

We compare our 3-stage and 4-stage model to the SLAMSS model from Song et al., 2023. Their model was also trained on MESA dataset using only motor and cardiac physiological signals. In particular, they extract activity count, heart rate, and heart rate standard deviation in 30 second epochs. They fed 12 epochs (6 minutes) of the 3 features (12x6) as input to the network. The newtork consists of 3 1-D CNN layers, 2 LSTM layers, and 1 attention layer. The underlying design of the attention block is unclear but using only the 3 CNN and 2 LSTM layers the network requires roughly 8.8 MFLOPS per epoch. This equates to roughly 450X more computation (1,056 MFLOPS/hr) compared to our 4-stage sleep classification model (2.3 MFLOPS/hr).

3-Stage Sleep Classification (MESA)

Reference Acc F1 WAKE NREM REM
Song et al., 2023 79.1 80.0 78.0 81.8 70.9
SleepKit 83.9 84.2 80.3 86.6 83.5

4-Stage Sleep Classification (MESA)

Reference Acc F1 WAKE CORE DEEP REM
Song et al., 2023 70.0 72.0 78.7 66.3 55.9 63.0
SleepKit 75.8 76.4 80.6 73.9 52.2 81.7

Downloads

Asset Description
configuration.json Configuration file
model.keras Keras Model file
model.tflite TFLite Model file
metrics.json Metrics file
Asset Description
configuration.json Configuration file
model.keras Keras Model file
model.tflite TFLite Model file
metrics.json Metrics file
Asset Description
configuration.json Configuration file
model.keras Keras Model file
model.tflite TFLite Model file
metrics.json Metrics file
Asset Description
configuration.json Configuration file
model.keras Keras Model file
model.tflite TFLite Model file
metrics.json Metrics file