Thank you in advance !
# questions
r
Thank you in advance !
l
Couple questions, โ€ข Is
pokemon.extras.datasets.image_dataset.ImageDataSet
your custom class? โ€ข Do you have the error message?:) My guesses so far is that in your node you say
dataset.load()
- but the input to your node will already be the loaded dataset. Whatever the output of
pokemon.extras.datasets.image_dataset.ImageDataSet._load()
is - a numpy array maybe?
Never mind, PartitionedDataSet I see ๐Ÿค“ Then my 2nd guess is that in your pipeline you say
outputs="pokemon_images"
, but your node returns 2 values. So maybe you should be catching both of them?
r
INFO Loading data from 'pokemon' (PartitionedDataset)... data_catalog.py:475 INFO Running node: load_pokemon_images([pokemon]) -> [pokemon_images] node.py:331 ERROR Node 'load_pokemon_images([pokemon]) -> [pokemon_images]' failed with error: node.py:356 'dict' object has no attribute 'load' WARNING No nodes ran. Repeat the previous command to attempt a new run. runner.py:213 โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ /home/rosana.eljurdi/.local/bin/kedro:8 in <module> โ”‚ โ”‚ โ”‚ โ”‚ 5 from kedro.framework.cli import main โ”‚ โ”‚ 6 if name == '__main__': โ”‚ โ”‚ 7 โ”‚ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) โ”‚ โ”‚ โฑ 8 โ”‚ sys.exit(main()) โ”‚ โ”‚ 9 โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/framework/cli/cli.py:211 in main โ”‚ โ”‚ โ”‚ โ”‚ 208 โ”‚ """ โ”‚ โ”‚ 209 โ”‚ _init_plugins() โ”‚ โ”‚ 210 โ”‚ cli_collection = KedroCLI(project_path=Path.cwd()) โ”‚ โ”‚ โฑ 211 โ”‚ cli_collection() โ”‚ โ”‚ 212 โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/click/core.py:1157 in call โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/framework/cli/cli.py:139 in main โ”‚ โ”‚ โ”‚ โ”‚ 136 โ”‚ โ”‚ ) โ”‚ โ”‚ 137 โ”‚ โ”‚ โ”‚ โ”‚ 138 โ”‚ โ”‚ try: โ”‚ โ”‚ โฑ 139 โ”‚ โ”‚ โ”‚ super().main( โ”‚ โ”‚ 140 โ”‚ โ”‚ โ”‚ โ”‚ args=args, โ”‚ โ”‚ 141 โ”‚ โ”‚ โ”‚ โ”‚ prog_name=prog_name, โ”‚ โ”‚ 142 โ”‚ โ”‚ โ”‚ โ”‚ complete_var=complete_var, โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/click/core.py:1078 in main โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/click/core.py:1688 in invoke โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/click/core.py:1434 in invoke โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/click/core.py:783 in invoke โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/framework/cli/project.py:453 in โ”‚ โ”‚ run โ”‚ โ”‚ โ”‚ โ”‚ 450 โ”‚ with KedroSession.create( โ”‚ โ”‚ 451 โ”‚ โ”‚ env=env, conf_source=conf_source, extra_params=params โ”‚ โ”‚ 452 โ”‚ ) as session: โ”‚ โ”‚ โฑ 453 โ”‚ โ”‚ session.run( โ”‚ โ”‚ 454 โ”‚ โ”‚ โ”‚ tags=tag, โ”‚ โ”‚ 455 โ”‚ โ”‚ โ”‚ runner=runner(is_async=is_async), โ”‚ โ”‚ 456 โ”‚ โ”‚ โ”‚ node_names=node_names, โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/framework/session/session.py:435 โ”‚ โ”‚ in run โ”‚ โ”‚ โ”‚ โ”‚ 432 โ”‚ โ”‚ ) โ”‚ โ”‚ 433 โ”‚ โ”‚ โ”‚ โ”‚ 434 โ”‚ โ”‚ try: โ”‚ โ”‚ โฑ 435 โ”‚ โ”‚ โ”‚ run_result = runner.run( โ”‚ โ”‚ 436 โ”‚ โ”‚ โ”‚ โ”‚ filtered_pipeline, catalog, hook_manager, session_id โ”‚ โ”‚ 437 โ”‚ โ”‚ โ”‚ ) โ”‚ โ”‚ 438 โ”‚ โ”‚ โ”‚ self._run_called = True โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/runner/runner.py:103 in run โ”‚ โ”‚ โ”‚ โ”‚ 100 โ”‚ โ”‚ โ”‚ self._logger.info( โ”‚ โ”‚ 101 โ”‚ โ”‚ โ”‚ โ”‚ "Asynchronous mode is enabled for loading and saving data" โ”‚ โ”‚ 102 โ”‚ โ”‚ โ”‚ ) โ”‚ โ”‚ โฑ 103 โ”‚ โ”‚ self._run(pipeline, catalog, hook_manager, session_id) โ”‚ โ”‚ 104 โ”‚ โ”‚ โ”‚ โ”‚ 105 โ”‚ โ”‚ self._logger.info("Pipeline execution completed successfully.") โ”‚ โ”‚ 106 โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/runner/sequential_runner.py:70 in โ”‚ โ”‚ _run โ”‚ โ”‚ โ”‚ โ”‚ 67 โ”‚ โ”‚ โ”‚ โ”‚ 68 โ”‚ โ”‚ for exec_index, node in enumerate(nodes): โ”‚ โ”‚ 69 โ”‚ โ”‚ โ”‚ try: โ”‚ โ”‚ โฑ 70 โ”‚ โ”‚ โ”‚ โ”‚ run_node(node, catalog, hook_manager, self._is_async, session_id) โ”‚ โ”‚ 71 โ”‚ โ”‚ โ”‚ โ”‚ done_nodes.add(node) โ”‚ โ”‚ 72 โ”‚ โ”‚ โ”‚ except Exception: โ”‚ โ”‚ 73 โ”‚ โ”‚ โ”‚ โ”‚ self._suggest_resume_scenario(pipeline, done_nodes, catalog) โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/runner/runner.py:331 in run_node โ”‚ โ”‚ โ”‚ โ”‚ 328 โ”‚ if is_async: โ”‚ โ”‚ 329 โ”‚ โ”‚ node = _run_node_async(node, catalog, hook_manager, session_id) โ”‚ โ”‚ 330 โ”‚ else: โ”‚ โ”‚ โฑ 331 โ”‚ โ”‚ node = _run_node_sequential(node, catalog, hook_manager, session_id) โ”‚ โ”‚ 332 โ”‚ โ”‚ โ”‚ 333 โ”‚ for name in node.confirms: โ”‚ โ”‚ 334 โ”‚ โ”‚ catalog.confirm(name) โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/runner/runner.py:426 in โ”‚ โ”‚ _run_node_sequential โ”‚ โ”‚ โ”‚ โ”‚ 423 โ”‚ ) โ”‚ โ”‚ 424 โ”‚ inputs.update(additional_inputs) โ”‚ โ”‚ 425 โ”‚ โ”‚ โ”‚ โฑ 426 โ”‚ outputs = _call_node_run( โ”‚ โ”‚ 427 โ”‚ โ”‚ node, catalog, inputs, is_async, hook_manager, session_id=session_id โ”‚ โ”‚ 428 โ”‚ ) โ”‚ โ”‚ 429 โ”‚ โ”‚ โ”‚
โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/runner/runner.py:392 in โ”‚ โ”‚ _call_node_run โ”‚ โ”‚ โ”‚ โ”‚ 389 โ”‚ โ”‚ โ”‚ is_async=is_async, โ”‚ โ”‚ 390 โ”‚ โ”‚ โ”‚ session_id=session_id, โ”‚ โ”‚ 391 โ”‚ โ”‚ ) โ”‚ โ”‚ โฑ 392 โ”‚ โ”‚ raise exc โ”‚ โ”‚ 393 โ”‚ hook_manager.hook.after_node_run( โ”‚ โ”‚ 394 โ”‚ โ”‚ node=node, โ”‚ โ”‚ 395 โ”‚ โ”‚ catalog=catalog, โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/runner/runner.py:382 in โ”‚ โ”‚ _call_node_run โ”‚ โ”‚ โ”‚ โ”‚ 379 ) -> dict[str, Any]: โ”‚ โ”‚ 380 โ”‚ โ”‚ โ”‚ 381 โ”‚ try: โ”‚ โ”‚ โฑ 382 โ”‚ โ”‚ outputs = node.run(inputs) โ”‚ โ”‚ 383 โ”‚ except Exception as exc: โ”‚ โ”‚ 384 โ”‚ โ”‚ hook_manager.hook.on_node_error( โ”‚ โ”‚ 385 โ”‚ โ”‚ โ”‚ error=exc, โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/pipeline/node.py:357 in run โ”‚ โ”‚ โ”‚ โ”‚ 354 โ”‚ โ”‚ # purposely catch all exceptions โ”‚ โ”‚ 355 โ”‚ โ”‚ except Exception as exc: โ”‚ โ”‚ 356 โ”‚ โ”‚ โ”‚ self._logger.error("Node '%s' failed with error: \n%s", str(self), str(exc)) โ”‚ โ”‚ โฑ 357 โ”‚ โ”‚ โ”‚ raise exc โ”‚ โ”‚ 358 โ”‚ โ”‚ โ”‚ 359 โ”‚ def _run_with_no_inputs(self, inputs: dict[str, Any]): โ”‚ โ”‚ 360 โ”‚ โ”‚ if inputs: โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/pipeline/node.py:348 in run โ”‚ โ”‚ โ”‚ โ”‚ 345 โ”‚ โ”‚ โ”‚ elif isinstance(self._inputs, str): โ”‚ โ”‚ 346 โ”‚ โ”‚ โ”‚ โ”‚ outputs = self._run_with_one_input(inputs, self._inputs) โ”‚ โ”‚ 347 โ”‚ โ”‚ โ”‚ elif isinstance(self._inputs, list): โ”‚ โ”‚ โฑ 348 โ”‚ โ”‚ โ”‚ โ”‚ outputs = self._run_with_list(inputs, self._inputs) โ”‚ โ”‚ 349 โ”‚ โ”‚ โ”‚ elif isinstance(self._inputs, dict): โ”‚ โ”‚ 350 โ”‚ โ”‚ โ”‚ โ”‚ outputs = self._run_with_dict(inputs, self._inputs) โ”‚ โ”‚ 351 โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/pipeline/node.py:388 in โ”‚ โ”‚ _run_with_list โ”‚ โ”‚ โ”‚ โ”‚ 385 โ”‚ โ”‚ โ”‚ โ”‚ f"{sorted(inputs.keys())}." โ”‚ โ”‚ 386 โ”‚ โ”‚ โ”‚ ) โ”‚ โ”‚ 387 โ”‚ โ”‚ # Ensure the function gets the inputs in the correct order โ”‚ โ”‚ โฑ 388 โ”‚ โ”‚ return self._func(*(inputs[item] for item in node_inputs)) โ”‚ โ”‚ 389 โ”‚ โ”‚ โ”‚ 390 โ”‚ def _run_with_dict(self, inputs: dict[str, Any], node_inputs: dict[str, str]): โ”‚ โ”‚ 391 โ”‚ โ”‚ # Node inputs and provided run inputs should completely overlap โ”‚ โ”‚ โ”‚ โ”‚ /home/rosana.eljurdi/Kedro_Project/pokemon/src/pokemon/pipelines/loader/nodes.py:8 in โ”‚ โ”‚ load_pokemon_images โ”‚ โ”‚ โ”‚ โ”‚ 5 โ”‚ โ”‚ 6 def load_pokemon_images(dataset): โ”‚ โ”‚ 7 โ”‚ # Load the dataset using your custom ImageDataSet โ”‚ โ”‚ โฑ 8 โ”‚ images = dataset.load() โ”‚ โ”‚ 9 โ”‚ โ”‚ โ”‚ 10 โ”‚ # Optionally, you can process or manipulate the loaded data here โ”‚ โ”‚ 11 โ”‚ # For example, if you want to return the number of images loaded: โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ AttributeError: 'dict' object has no attribute 'load'
Yes this is my custum dataset. Thank you for your help ๐Ÿ™‚
l
Yeah so my first guess was correct I think. The
PartitionedDataSet
will return a dictionary of datasets. So change your node function: โ€ข remove
dataset.load()
โ€ข change input variable name to something like
dict_of_images
to avoid confusion โ€ข Don't return your input dataset, seems like your node just needs to count the images
Copy code
def load_pokemon_images(dict_of_images):
    # Optionally, you can process or manipulate the loaded data here
    # For example, if you want to return the number of images loaded:
    num_images = len(dict_of_images)
    
    return num_images
r
@Lodewic van Twillert "Yeah so my first guess was correct I think. The
PartitionedDataSet
will return a dictionary of datasets.": well I actually need to load the images but I can not access the data
I have removed the second part of the function which returns len(num_images) but I still get the same error when I try to run the .load() func.
my objective here is to loader the dataset and access the images as numpy arrays
l
Okay so in that case, following the example here: https://docs.kedro.org/en/stable/kedro.io.PartitionedDataset.html How about this,
Copy code
def load_pokemon_images(dataset):  # `dataset` is a dictionary, one entry for each image I expect
    num_images = len(dataset) 

    # Load all the partitions
    for partition_id, partition_load_func in loaded.items():
       # The actual function that loads the data
       image_data = partition_load_func()    
   
    return num_images
Not sure what exactly what you want to do to each image, but you could add it in the for-loop
r
Thank you for your reply: when I run the above code, I get the following error: image_data = partition_load_func() Traceback (most recent call last): File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 210, in load return self._load() File "/home/rosana.eljurdi/Kedro_Project/pokemon/src/pokemon/extras/datasets/image_dataset.py", line 34, in _load load_path = get_filepath_str(self._get_load_path(), self._protocol) File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 596, in _get_load_path if not self._version: AttributeError: 'ImageDataSet' object has no attribute '_version' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-3-5eebafde8793>", line 1, in <module> image_data = partition_load_func() File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 631, in load return super().load() File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 219, in load raise DatasetError(message) from exc kedro.io.core.DatasetError: Failed while loading data from data set ImageDataSet(filepath=/home/rosana.eljurdi/Kedro_Project/pokemon/data/01_raw/pokemon-images-and-types/images/images/abomasnow.png, protocol=file). 'ImageDataSet' object has no attribute '_version'
I can put it in a for loop my aim is to just have access the the data in numpy arrays so that I can create a CNN clarifier. It is for learning purpose. My application is a bit more complicated as I need to create an sitk dataloafer that deals with sitk images
l
Hmm not sure exactly without seeing more of your custom
ImageDataSet
. What is it's parent class? Is it
AbstractVersionedDataset
? Are you sure you need the Versioned dataset? Does your custom class overwrite the
__init__()
without assigning
self.version
perhaps? ๐Ÿค” Because this line, comes from the
AbstractVersionedDataset
and by default would set the
self.version
attribute so I'm surprised it says it doesn't have that attribute on this line in your error:
Copy code
File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 596, in _get_load_path
    if not self._version:
r
for the ImageDataSet,
Copy code
from pathlib import PurePosixPath
from kedro.io.core import (
AbstractVersionedDataSet,
get_filepath_str,
get_protocol_and_path,
)
import fsspec
import numpy as np
# PIL is the package from Pillow
from PIL import Image
from typing import Dict, Any

class ImageDataSet(AbstractVersionedDataSet):
        def __init__(self, filepath: str):
                """Creates a new instance of ImageDataSet to load / save image data for given
                    filepath.
                    Args:
                    filepath: The location of the image file to load / save data.
                """
                # parse the path and protocol (e.g. file, http, s3, etc.)
                protocol, path = get_protocol_and_path(filepath)
                self._protocol = protocol
                self._filepath = PurePosixPath(path)
                self._fs = fsspec.filesystem(self._protocol)
                
                    
        def _load(self) -> np.ndarray:
                """Loads data from the image file.
                Returns:
                Data from the image file as a numpy array
                """
                # using get_filepath_str ensures that the protocol and path are appended
                #ห“โ†’correctly for different filesystems
                load_path = get_filepath_str(self._get_load_path(), self._protocol)
                with self._fs.open(load_path) as f:
                        image = Image.open(f).convert("RGBA")
                        return np.asarray(image)
                    

        def _save(self, data: np.ndarray) -> None:
                """Saves image data to the specified filepath.
                # using get_filepath_str ensures that the protocol and path are appended
                #ห“โ†’correctly for different filesystems
                """
                save_path = get_filepath_str(self._get_save_path(), self._protocol)
                with self._fs.open(save_path, "wb") as f:
                        image = Image.fromarray(data)
                        image.save(f)
        
        def _describe(self) -> Dict[str, Any]:
                """Returns a dict that describes the attributes of the dataset."""
                return dict(filepath=self._filepath, protocol=self._protocol)
l
Change the parent class to
AbstractDataSet
instead, and let me know how that changes things
r
@Lodewic van Twillert now I get another error : 'ImageDataSet' object has no attribute '_get_load_path' Traceback (most recent call last): File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 210, in load return self._load() File "/home/rosana.eljurdi/Kedro_Project/pokemon/src/pokemon/extras/datasets/image_dataset.py", line 34, in _load load_path = get_filepath_str(self._get_load_path(), self._protocol) AttributeError: 'ImageDataSet' object has no attribute '_get_load_path' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-1-19522f69f156>", line 1, in <module> partition_load_func() File "/home/rosana.eljurdi/.local/lib/python3.10/site-packages/kedro/io/core.py", line 219, in load raise DatasetError(message) from exc kedro.io.core.DatasetError: Failed while loading data from data set ImageDataSet(filepath=/home/rosana.eljurdi/Kedro_Project/pokemon/data/01_raw/pokemon-images-and-types/images/images/abomasnow.png, protocol=file). 'ImageDataSet' object has no attribute '_get_load_path'
l
Ok, two questions again then: are you sure you need to use a Custom DataSet class? If so, you can change you
_load()
function to
Copy code
def _load(self) -> np.ndarray:
                with self._fs.open(self._filepath) as f:
                        image = Image.open(f).convert("RGBA")
                        return np.asarray(image)
Second question, have you considered using https://docs.kedro.org/en/stable/kedro_datasets.pillow.ImageDataSet.html ?
๐Ÿฅณ 1
r
It has worked ๐Ÿ˜„๐Ÿ˜„. Thank you so much !
๐Ÿฅณ 1
l
For future reference and other people , I see now where you got the dataset from , and even better is to use the example in the tutorial from the docs: Non-versioned: https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html#the-complete-example โ€ข Inherits from
AbstractDataSet
โ€ข More complete
_load()
method then the last example I sent ๐Ÿ‘ And the Versioned dataset variant: https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html#how-to-implement-versioning-in-your-dataset โ€ข Inherits
AbstractVersionedDataSet
โ€ข Be sure to include
version
as an argument to
__init__
โ€ข Call
super.__init__(..., version=version)
to make sure it's instantiated I think at some point you mixed those two classes and that's where it got broken
K 1
r
Mm. Yes that is right! Thank you alot for the help.
j
thanks @Lodewic van Twillert for chiming in! ๐Ÿ™Œ๐Ÿผ