The following table provides the latest pre-trained models for sleep detection. Below we also provide additional details including training configuration, accuracy metrics, and hardware performance results for the models.
NAME
LOCATION
# CLASSES
MODEL
PARAMS
FLOPS
ACCURACY
AP
SS-2-TCN-SM
Wrist
2
TCN
10K
1.7M/hr
88.9%
96.0%
SS-3-TCN-SM
Wrist
3
TCN
11K
1.9M/hr
84.4%
91.8%
SS-4-TCN-SM
Wrist
4
TCN
26K
4.2M/hr
75.5%
81.6%
SS-5-TCN-SM
Wrist
5
TCN
28K
4.6M/hr
68.4%
74.1%
Model Details
The SS-2-TCN-SM model is a 2-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify sleep and wake stages. The model requires only 10K parameters and achieves an accuracy of 88.8% and an average AP score of 96.2%.
The SS-3-TCN-SM model is a 3-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, sleep and rem stages. The model requires only 14K parameters and achieves an accuracy of 84% and an average AP score of 91.5%.
The SS-4-TCN-SM model is a 4-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, core, deep, and rem stages. The model requires only 18K parameters and achieves an accuracy of 75.8% and an average AP score of 83.1%.
The SS-5-TCN-SM model is a 5-stage sleep stage model that uses a Temporal Convolutional Network (TCN) architecture. The model is trained on PPG and IMU data collected from the wrist and is able to classify awake, n1, n2, n3, and rem stages. The model requires only 17K parameters and achieves an accuracy of 70.4% and an average AP score of 76.4%.
The following table provides the latest performance and accuracy results of all models when running on Apollo4 Plus EVB. These results are obtained using neuralSPOTs Autodeploy tool. From neuralSPOT repo, the following command can be used to capture EVB results via Autodeploy:
In addition, we can capture statistics from each layer. The following bar plot provides the latency of each block in the 4-stage sleep classification TCN model. For example, ENC refers to initial encoder 1-d seperable convulional layer, B1.1 refers to all the layers in block 1, depth 1, B1.2 refers to block 1, depth 2, and so on. We can see that as we go deeper into the network we see an increase in latency due to the increasing number of channels. The final DEC layer refers to the decoder layer which is a 1-d convolutional layer with 3 output channels (4 classes).
Comparison
We compare our 3-stage and 4-stage model to the SLAMSS model from Song et al., 2023. Their model was also trained on MESA dataset using only motor and cardiac physiological signals. In particular, they extract activity count, heart rate, and heart rate standard deviation in 30 second epochs. They fed 12 epochs (6 minutes) of the 3 features (12x6) as input to the network. The newtork consists of 3 1-D CNN layers, 2 LSTM layers, and 1 attention layer. The underlying design of the attention block is unclear but using only the 3 CNN and 2 LSTM layers the network requires roughly 8.8 MFLOPS per epoch. This equates to roughly 450X more computation (1,056 MFLOPS/hr) compared to our 4-stage sleep classification model (2.3 MFLOPS/hr).