Rosana EL-JURDI
09/03/2023, 5:46 PMLodewic van Twillert
09/03/2023, 6:00 PMpokemon.extras.datasets.image_dataset.ImageDataSet
your custom class?
โข Do you have the error message?:)
dataset.load()
- but the input to your node will already be the loaded dataset. Whatever the output of pokemon.extras.datasets.image_dataset.ImageDataSet._load()
is - a numpy array maybe?outputs="pokemon_images"
, but your node returns 2 values. So maybe you should be catching both of them?Rosana EL-JURDI
09/03/2023, 6:04 PMLodewic van Twillert
09/03/2023, 6:08 PMPartitionedDataSet
will return a dictionary of datasets.
So change your node function:
โข remove dataset.load()
โข change input variable name to something like dict_of_images
to avoid confusion
โข Don't return your input dataset, seems like your node just needs to count the images
def load_pokemon_images(dict_of_images):
# Optionally, you can process or manipulate the loaded data here
# For example, if you want to return the number of images loaded:
num_images = len(dict_of_images)
return num_images
Rosana EL-JURDI
09/03/2023, 6:08 PMPartitionedDataSet
will return a dictionary of datasets.":
well I actually need to load the images but I can not access the dataLodewic van Twillert
09/03/2023, 6:15 PMdef load_pokemon_images(dataset): # `dataset` is a dictionary, one entry for each image I expect
num_images = len(dataset)
# Load all the partitions
for partition_id, partition_load_func in loaded.items():
# The actual function that loads the data
image_data = partition_load_func()
return num_images
Not sure what exactly what you want to do to each image, but you could add it in the for-loopRosana EL-JURDI
09/03/2023, 6:22 PMLodewic van Twillert
09/03/2023, 6:34 PMImageDataSet
. What is it's parent class? Is it AbstractVersionedDataset
?
Are you sure you need the Versioned dataset?
Does your custom class overwrite the __init__()
without assigning self.version
perhaps? ๐ค
Because this line, comes from the AbstractVersionedDataset
and by default would set the self.version
attribute so I'm surprised it says it doesn't have that attribute on this line in your error:
File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 596, in _get_load_path
if not self._version:
Rosana EL-JURDI
09/03/2023, 6:37 PMfrom pathlib import PurePosixPath
from kedro.io.core import (
AbstractVersionedDataSet,
get_filepath_str,
get_protocol_and_path,
)
import fsspec
import numpy as np
# PIL is the package from Pillow
from PIL import Image
from typing import Dict, Any
class ImageDataSet(AbstractVersionedDataSet):
def __init__(self, filepath: str):
"""Creates a new instance of ImageDataSet to load / save image data for given
filepath.
Args:
filepath: The location of the image file to load / save data.
"""
# parse the path and protocol (e.g. file, http, s3, etc.)
protocol, path = get_protocol_and_path(filepath)
self._protocol = protocol
self._filepath = PurePosixPath(path)
self._fs = fsspec.filesystem(self._protocol)
def _load(self) -> np.ndarray:
"""Loads data from the image file.
Returns:
Data from the image file as a numpy array
"""
# using get_filepath_str ensures that the protocol and path are appended
#หโcorrectly for different filesystems
load_path = get_filepath_str(self._get_load_path(), self._protocol)
with self._fs.open(load_path) as f:
image = Image.open(f).convert("RGBA")
return np.asarray(image)
def _save(self, data: np.ndarray) -> None:
"""Saves image data to the specified filepath.
# using get_filepath_str ensures that the protocol and path are appended
#หโcorrectly for different filesystems
"""
save_path = get_filepath_str(self._get_save_path(), self._protocol)
with self._fs.open(save_path, "wb") as f:
image = Image.fromarray(data)
image.save(f)
def _describe(self) -> Dict[str, Any]:
"""Returns a dict that describes the attributes of the dataset."""
return dict(filepath=self._filepath, protocol=self._protocol)
Lodewic van Twillert
09/03/2023, 6:38 PMAbstractDataSet
instead, and let me know how that changes thingsRosana EL-JURDI
09/03/2023, 6:43 PMLodewic van Twillert
09/03/2023, 7:00 PM_load()
function to
def _load(self) -> np.ndarray:
with self._fs.open(self._filepath) as f:
image = Image.open(f).convert("RGBA")
return np.asarray(image)
Second question, have you considered using https://docs.kedro.org/en/stable/kedro_datasets.pillow.ImageDataSet.html ?Rosana EL-JURDI
09/03/2023, 7:16 PMLodewic van Twillert
09/03/2023, 7:48 PMAbstractDataSet
โข More complete _load()
method then the last example I sent ๐
And the Versioned dataset variant: https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html#how-to-implement-versioning-in-your-dataset
โข Inherits AbstractVersionedDataSet
โข Be sure to include version
as an argument to __init__
โข Call super.__init__(..., version=version)
to make sure it's instantiated
I think at some point you mixed those two classes and that's where it got brokenRosana EL-JURDI
09/03/2023, 8:43 PMJuan Luis
09/04/2023, 9:05 AM