Skip to content

🏭 Dataset Factory

HeartKit provides support for a number of datasets to facilitate training the heart-monitoring tasks. Most of the datasets are readily available and can be downloaded and used for training and evaluation. Please make sure to review each dataset's license for terms and limitations.

Denoise Datasets

ECG denoising is the process of removing noise from an ECG signal. The following datasets are available for denoising tasks:

  • LUDB: Lobachevsky University Electrocardiography database consists of 200 10-second 12-lead records. The boundaries and peaks of P, T waves and QRS complexes were manually annotated by cardiologists. Each record is annotated with the corresponding diagnosis.

  • PTB-XL: The PTB-XL is a large publicly available electrocardiography dataset. It contains 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The ECGs are sampled at 500 Hz and are annotated by up to two cardiologists.

  • Synthetic: A synthetic dataset generated using PhysioKit. The dataset enables the generation of ECG signals with a variety of heart conditions and noise levels.


Segmentation Datasets

ECG segmentation is the process of identifying the boundaries of the P-wave, QRS complex, and T-wave in an ECG signal. The following datasets are available for segmentation tasks:

  • LUDB: Lobachevsky University Electrocardiography database consists of 200 10-second 12-lead records. The boundaries and peaks of P, T waves and QRS complexes were manually annotated by cardiologists. Each record is annotated with the corresponding diagnosis.

  • QTDB: Over 100 fifteen-minute two-lead ECG recordings with onset, peak, and end markers for P, QRS, T, and (where present) U waves of from 30 to 50 selected beats in each recording.

  • Synthetic: A synthetic dataset generated using PhysioKit. The dataset enables the generation of ECG signals with a variety of heart conditions and noise levels.


Rhythm Datasets

Rhythm detection is the process of identifying abnormal heart rhythms. The following datasets are available for rhythm tasks:

  • Icentia11k: This dataset consists of ECG recordings from 11,000 patients and 2 billion labelled beats. The data was collected by the CardioSTAT, a single-lead heart monitor device from Icentia. The raw signals were recorded with a 16-bit resolution and sampled at 250 Hz with the CardioSTAT in a modified lead 1 position.

  • PTB-XL: The PTB-XL is a large publicly available electrocardiography dataset. It contains 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The ECGs are sampled at 500 Hz and are annotated by up to two cardiologists.

  • LSAD: The Large Scale Rhythm Database (LSAD) is a large publicly available electrocardiography dataset. It contains 10 second, 12-lead ECGs of 45,152 patients with a 500 Hz sampling rate. The ECGs are sampled at 500 Hz and are annotated by up to two cardiologists.

  • Synthetic: A synthetic dataset generated using PhysioKit. The dataset enables the generation of ECG signals with a variety of heart conditions and noise levels.


Beat Datasets

Beat classification is the process of identifying abnormal beats in an ECG signal. The following datasets are available for beat classification tasks:

  • Icentia11k: This dataset consists of ECG recordings from 11,000 patients and 2 billion labelled beats. The data was collected by the CardioSTAT, a single-lead heart monitor device from Icentia. The raw signals were recorded with a 16-bit resolution and sampled at 250 Hz with the CardioSTAT in a modified lead 1 position.

  • PTB-XL: The PTB-XL is a large publicly available electrocardiography dataset. It contains 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The ECGs are sampled at 500 Hz and are annotated by up to two cardiologists.