Sleep Stage Models

Model Overview

The following table provides the latest pre-trained models for sleep detection. Below we also provide additional details including training configuration, accuracy metrics, and hardware performance results for the models.

NAME	LOCATION	# CLASSES	MODEL	PARAMS	FLOPS	ACCURACY	AP
SS-2-TCN-SM	Wrist	2	TCN	10K	1.7M/hr	88.9%	96.0%
SS-3-TCN-SM	Wrist	3	TCN	11K	1.9M/hr	84.4%	91.8%
SS-4-TCN-SM	Wrist	4	TCN	26K	4.2M/hr	75.5%	81.6%
SS-5-TCN-SM	Wrist	5	TCN	28K	4.6M/hr	68.4%	74.1%

Model Details

SS-2-TCN-SMSS-3-TCN-SMSS-4-TCN-SMSS-5-TCN-SM

The SS-2-TCN-SM model is a 2-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify sleep and wake stages. The model requires only 10K parameters and achieves an accuracy of 88.8% and an average AP score of 96.2%.

Location: Wrist
Classes: Awake, Sleep
Frame Size: 2 hours
Datasets: MESA
Features: FS-W-PA-14

Base Class	Target Class	Label
0-WAKE	0	WAKE
1-N1, 2-N2, 3-N3, 5-REM	1	SLEEP

The SS-3-TCN-SM model is a 3-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, sleep and rem stages. The model requires only 14K parameters and achieves an accuracy of 84% and an average AP score of 91.5%.

Location: Wrist
Classes: Awake, NREM, REM
Frame Size: 2 hours
Datasets: MESA
Features: FS-W-PA-14

Base Class	Target Class	Label
0-WAKE	0	WAKE
1-N1, 2-N2, 3-N3	1	NREM
5-REM	2	REM

The SS-4-TCN-SM model is a 4-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, core, deep, and rem stages. The model requires only 18K parameters and achieves an accuracy of 75.8% and an average AP score of 83.1%.

Location: Wrist
Classes: Awake, Core, Deep, REM
Frame Size: 2 hours
Datasets: MESA
Features: FS-W-PA-14

Base Class	Target Class	Label
0-WAKE	0	WAKE
1-N1, 2-N2	1	CORE
3-N3	2	DEEP
5-REM	3	REM

The SS-5-TCN-SM model is a 5-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, n1, n2, n3, and rem stages. The model requires only 17K parameters and achieves an accuracy of 70.4% and an average AP score of 76.4%.

Location: Wrist
Classes: Awake, N1, N2, N3, REM
Frame Size: 2 hours
Datasets: MESA
Features: FS-W-PA-14

Base Class	Target Class	Label
0-WAKE	0	WAKE
1-N1	1	N1
2-N2	2	N2
3-N3	3	N3
5-REM	4	REM

Model Performance

SS-2-TCN-SMSS-3-TCN-SMSS-4-TCN-SMSS-5-TCN-SM

Confusion Matrix

Sleep Efficiency Plot

Total Sleep Time (TST) Plot

Confusion Matrix

EVB Performance

The following table provides the latest performance and accuracy results of all models when running on Apollo4 Plus EVB. These results are obtained using neuralSPOTs Autodeploy tool. From neuralSPOT repo, the following command can be used to capture EVB results via Autodeploy:

python -m ns_autodeploy \
--tflite-filename model.tflite \
--model-name model \
--cpu-mode 192 \
--arena-size-scratch-buffer-padding 0 \
--max-arena-size 80 \

Name	Params	FLOPS	Metric	Time	Arena	Energy
SD-2-TCN-SM	10K	1.7M/hr	96.0% AP	66ms/hr	32KB	705uJ/hr
SD-3-TCN-SM	11K	1.9M/hr	91.8% AP	75ms/hr	34KB	710uJ/hr
SD-4-TCN-SM	26K	4.2M/hr	81.6% AP	129ms/hr	47KB	1.23mJ/hr
SD-5-TCN-SM	28K	4.6M/hr	74.1% AP	172ms/hr	50KB	1.65mJ/hr

In addition, we can capture statistics from each layer. The following bar plot provides the latency of each block in the 4-stage sleep classification TCN model. For example, ENC refers to initial encoder 1-d seperable convulional layer, B1.1 refers to all the layers in block 1, depth 1, B1.2 refers to block 1, depth 2, and so on. We can see that as we go deeper into the network we see an increase in latency due to the increasing number of channels. The final DEC layer refers to the decoder layer which is a 1-d convolutional layer with 3 output channels (4 classes).

Comparison

We compare our 3-stage and 4-stage model to the SLAMSS model from Song et al., 2023. Their model was also trained on MESA dataset using only motor and cardiac physiological signals. In particular, they extract activity count, heart rate, and heart rate standard deviation in 30 second epochs. They fed 12 epochs (6 minutes) of the 3 features (12x6) as input to the network. The newtork consists of 3 1-D CNN layers, 2 LSTM layers, and 1 attention layer. The underlying design of the attention block is unclear but using only the 3 CNN and 2 LSTM layers the network requires roughly 8.8 MFLOPS per epoch. This equates to roughly 450X more computation (1,056 MFLOPS/hr) compared to our 4-stage sleep classification model (2.3 MFLOPS/hr).

3-Stage Sleep Classification (MESA)

Reference	Acc	F1	WAKE	NREM	REM
Song et al., 2023	79.1	80.0	78.0	81.8	70.9
SleepKit	83.9	84.2	80.3	86.6	83.5

4-Stage Sleep Classification (MESA)

Reference	Acc	F1	WAKE	CORE	DEEP	REM
Song et al., 2023	70.0	72.0	78.7	66.3	55.9	63.0
SleepKit	75.8	76.4	80.6	73.9	52.2	81.7

Downloads

SS-2-TCN-SMSS-3-TCN-SMSS-4-TCN-SMSS-5-TCN-SM

Asset	Description
configuration.json	Configuration file
model.keras	Keras Model file
model.tflite	TFLite Model file
metrics.json	Metrics file

Asset	Description
configuration.json	Configuration file
model.keras	Keras Model file
model.tflite	TFLite Model file
metrics.json	Metrics file

Asset	Description
configuration.json	Configuration file
model.keras	Keras Model file
model.tflite	TFLite Model file
metrics.json	Metrics file

Asset	Description
configuration.json	Configuration file
model.keras	Keras Model file
model.tflite	TFLite Model file
metrics.json	Metrics file