dataset
Classes
HKDataset
HKDataset serves as a base class to download and provide unified access to datasets.
Parameters:
-
(pathPathLike) –Path to dataset
-
(cacheablebool, default:True) –If dataset supports file caching. Defaults
Example:
import numpy as np
import heartkit as hk
class MyDataset(hk.HKDataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
@property
def name(self) -> str:
return 'my-dataset'
@property
def sampling_rate(self) -> int:
return 100
def get_train_patient_ids(self) -> npt.NDArray:
return np.arange(80)
def get_test_patient_ids(self) -> npt.NDArray:
return np.arange(80, 100)
@contextlib.contextmanager
def patient_data(self, patient_id: int) -> Generator[PatientData, None, None]:
data = np.random.randn(1000)
segs = np.random.randint(0, 1000, (10, 2))
yield {"data": data, "segmentations": segs}
def signal_generator(
self,
patient_generator: PatientGenerator,
frame_size: int,
samples_per_patient: int = 1,
target_rate: int | None = None,
) -> Generator[npt.NDArray, None, None]:
for patient in patient_generator:
for _ in range(samples_per_patient):
with self.patient_data(patient) as pt:
yield pt["data"]
def download(self, num_workers: int | None = None, force: bool = False):
pass
# Register dataset
hk.DatasetFactory.register("my-dataset", MyDataset)
Source code in heartkit/datasets/dataset.py
Attributes
cacheable
property
writable
If dataset supports in-memory caching.
On smaller datasets, it is recommended to cache the entire dataset in memory.
Functions
get_train_patient_ids
get_test_patient_ids
patient_data
Get patient data
Parameters:
-
(patient_idint) –Patient ID
Returns:
-
None–Generator[PatientData, None, None]: Patient data
Source code in heartkit/datasets/dataset.py
signal_generator
signal_generator(
patient_generator: PatientGenerator, frame_size: int, samples_per_patient: int = 1, target_rate: int | None = None
) -> Generator[npt.NDArray, None, None]
Generate random frames.
Parameters:
-
(patient_generatorPatientGenerator) –Generator that yields patient data.
-
(frame_sizeint) –Frame size
-
(samples_per_patientint, default:1) –Samples per patient. Defaults to 1.
-
(target_rateint | None, default:None) –Target rate. Defaults to None.
Returns:
-
None–Generator[npt.NDArray, None, None]: Generator sample of data
Source code in heartkit/datasets/dataset.py
download
split_train_test_patients
split_train_test_patients(
patient_ids: NDArray, test_size: float, label_map: dict[int, int] | None = None, label_type: str | None = None
) -> list[list[int]]
Perform train/test split on patients for given task. NOTE: We only perform inter-patient splits and not intra-patient.
Parameters:
-
(patient_idsNDArray) –Patient Ids
-
(test_sizefloat) –Test size
-
(label_mapdict[int, int], default:None) –Label map. Defaults to None.
-
(label_typestr, default:None) –Label type. Defaults to None.
Returns:
Source code in heartkit/datasets/dataset.py
filter_patients_for_labels
filter_patients_for_labels(
patient_ids: NDArray, label_map: dict[int, int] | None = None, label_type: str | None = None
) -> npt.NDArray
Filter patients for given labels.
Parameters:
-
(patient_idsNDArray) –Patient ids
-
(label_mapdict[int, int], default:None) –Label map. Defaults to None.
-
(label_typestr, default:None) –Label type. Defaults to None.
Returns:
-
NDArray–npt.NDArray: Filtered patient ids