dataset
Classes
HKDataset
HKDataset serves as a base class to download and provide unified access to datasets.
Parameters:
-
path
(PathLike
) –Path to dataset
-
cacheable
(bool
, default:True
) –If dataset supports file caching. Defaults
Example:
import numpy as np
import heartkit as hk
class MyDataset(hk.HKDataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
@property
def name(self) -> str:
return 'my-dataset'
@property
def sampling_rate(self) -> int:
return 100
def get_train_patient_ids(self) -> npt.NDArray:
return np.arange(80)
def get_test_patient_ids(self) -> npt.NDArray:
return np.arange(80, 100)
@contextlib.contextmanager
def patient_data(self, patient_id: int) -> Generator[PatientData, None, None]:
data = np.random.randn(1000)
segs = np.random.randint(0, 1000, (10, 2))
yield {"data": data, "segmentations": segs}
def signal_generator(
self,
patient_generator: PatientGenerator,
frame_size: int,
samples_per_patient: int = 1,
target_rate: int | None = None,
) -> Generator[npt.NDArray, None, None]:
for patient in patient_generator:
for _ in range(samples_per_patient):
with self.patient_data(patient) as pt:
yield pt["data"]
def download(self, num_workers: int | None = None, force: bool = False):
pass
# Register dataset
hk.DatasetFactory.register("my-dataset", MyDataset)
Source code in heartkit/datasets/dataset.py
Attributes
cacheable
property
writable
If dataset supports in-memory caching.
On smaller datasets, it is recommended to cache the entire dataset in memory.
Functions
get_train_patient_ids
get_test_patient_ids
patient_data
Get patient data
Parameters:
-
patient_id
(int
) –Patient ID
Returns:
-
None
–Generator[PatientData, None, None]: Patient data
Source code in heartkit/datasets/dataset.py
signal_generator
signal_generator(
patient_generator: PatientGenerator, frame_size: int, samples_per_patient: int = 1, target_rate: int | None = None
) -> Generator[npt.NDArray, None, None]
Generate random frames.
Parameters:
-
patient_generator
(PatientGenerator
) –Generator that yields patient data.
-
frame_size
(int
) –Frame size
-
samples_per_patient
(int
, default:1
) –Samples per patient. Defaults to 1.
-
target_rate
(int | None
, default:None
) –Target rate. Defaults to None.
Returns:
-
None
–Generator[npt.NDArray, None, None]: Generator sample of data
Source code in heartkit/datasets/dataset.py
download
split_train_test_patients
split_train_test_patients(
patient_ids: npt.NDArray, test_size: float, label_map: dict[int, int] | None = None, label_type: str | None = None
) -> list[list[int]]
Perform train/test split on patients for given task. NOTE: We only perform inter-patient splits and not intra-patient.
Parameters:
-
patient_ids
(NDArray
) –Patient Ids
-
test_size
(float
) –Test size
-
label_map
(dict[int, int]
, default:None
) –Label map. Defaults to None.
-
label_type
(str
, default:None
) –Label type. Defaults to None.
Returns:
Source code in heartkit/datasets/dataset.py
filter_patients_for_labels
filter_patients_for_labels(
patient_ids: npt.NDArray, label_map: dict[int, int] | None = None, label_type: str | None = None
) -> npt.NDArray
Filter patients for given labels.
Parameters:
-
patient_ids
(NDArray
) –Patient ids
-
label_map
(dict[int, int]
, default:None
) –Label map. Defaults to None.
-
label_type
(str
, default:None
) –Label type. Defaults to None.
Returns:
-
NDArray
–npt.NDArray: Filtered patient ids