Data
Mimics torch.data.Dataset for ray.data integration
RayDataset
¶
Bases: IterableDataset
map_(func, *args, **kwargs)
¶
Inplace Map for ray.data Time complexity: O(dataset size / parallelism)
See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets
map_batch_(func, batch_size=2, **kwargs)
¶
Inplace Map for ray.data Time complexity: O(dataset size / parallelism) See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets
RayImageFolder
¶
Bases: RayDataset
Read image datasets
root/dog/xxx.png
root/dog/xxy.png
root/dog/[...]/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/[...]/asd932_.png
Data loader for image dataset
image_dataset_from_directory(directory, transform=None, image_size=(224, 224), batch_size=1, shuffle=False, pin_memory=True, num_workers=None, ray_data=False)
¶
Create Dataset and Dataloader for image folder dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory |
Union[List[str], Path, str]
|
required | |
transform |
None
|
||
image_size |
(224, 224)
|
||
batch_size |
int
|
1
|
|
shuffle |
bool
|
False
|
|
pin_memory |
bool
|
True
|
|
num_workers |
Optional[int]
|
None
|
Returns:
Type | Description |
---|---|
Data
|
A dictionary containing dataset and dataloader. |
Provide some common functionalities/utilities for Datasets
random_split_dataset(data, pct=0.9)
¶
Randomly splits dataset into two sets. Length of first split is len(data) * pct.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Dataset
|
pytorch Dataset object with |
required |
pct |
percentage of split. |
0.9
|