Hi all I have worked with kedro many times in different oper Kedro #questions

Hi all! I have worked with kedro many times in dif...

Nicolas Betancourt Cardona

10/03/2024, 4:16 PM

Hi all! I have worked with kedro many times in different operating systems and I have never had issues with catalog path entries. It has always been fine to define catalog entries such like

Copy code

catalog_entry:
  type: AnyDataset
  filepath: data/01_raw/file.extension

whether on windows or mac. Now I'm having an issue with it for the first time. It turns out that the following catalog entry

Copy code

problematic_catalog_entry
  type: MyCustomDataSet
  mainfolderpath: data/01_raw/file.extension

rises a

winerror 3 the system cannot find the path specified

when loaded from a Kedro Jupyter Notebook but

Copy code

problematic_catalog_entry_2
  type: MyCustomDataSet
  mainfolderpath: C:\same\path\but\absolute\data\01_raw\file.extension

doesn't. This is absolutely my fault because the data set type I'm using is a custom

AbstractDataset

but I don't have this problem with other custom

AbstractDataset

. I will attach my

_load

method because the problem might be there

Copy code

def _load(self):
        subfolder_names=[ subfolder_name 
                         for subfolder_name in os.listdir(self._mainfolderpath) 
                         if os.path.isdir(os.path.join(self._mainfolderpath, subfolder_name)) 
                        ]
        
        
        wav_paths_dict={}
        for subfolder_name in subfolder_names:
            subfolder_path=os.path.join(self._mainfolderpath, subfolder_name)
            wav_files=[]
            for root, dirs, files in os.walk(subfolder_path):
                for file in files:
                    if file.lower().endswith('.wav'):
                        wav_file_path=os.path.join(root, file)
                        wav_file_name=os.path.split(wav_file_path)[-1].replace('.wav','').replace('.WAV','')
                        wav_files.append((wav_file_name,wav_file_path))
                wav_paths_dict[subfolder_name]=dict(wav_files)

        
        partitioned_dataset_dict={}
        for subfolder_name, sub_dict in wav_paths_dict.items():
            partitioned_dataset=[(wav_file_name,SoundDataset(wav_file_path).load()) for wav_file_name,wav_file_path in sub_dict.items()]
            partitioned_dataset_dict[subfolder_name]=dict(partitioned_dataset)
        
        return partitioned_dataset_dict

__init__

I'm initializing

self._mainfolderpath

this way:

self._mainfolderpath = PurePosixPath(mainfolderpath)

. Thank you very much for yor help again

Nok Lam Chan

10/03/2024, 4:23 PM

Is it possible to create an minimal example? https://stackoverflow.com/help/minimal-reproducible-example

Nok Lam Chan

10/03/2024, 4:27 PM

Is the problem that it handle relative path but fail to process the Windows path? > rises a

winerror 3 the system cannot find the path specified

when loaded from a Kedro Jupyter Notebook but Which lines of code give you this error? You should be able to tell from the stacktrace, or simply print out the path.

Nicolas Betancourt Cardona

10/03/2024, 4:32 PM

It seems that the problem is only in Jupyter. The line of code that rises the error is

catalog.load('problematic_catalog_entry')

in a kedro jupyter notebook (this is the catalog entry with the relative path). Meanwhile the line

catalog.load('problematic_catalog_entry_2')

do not rises an error. I just ran

kedro run --node test_node

from my terminal, where test_node has

problematic_catalog_entry

as input and it does not rises an error. This is the same catalog entry that rises an error on jjupyter

Nok Lam Chan

10/03/2024, 4:33 PM

How did you create the catalog? or you are using the default one comes with the extension? For example you can do

%load_ext kedro.ipython

, that should load up a global

catalog

for you.

Nicolas Betancourt Cardona

10/03/2024, 4:35 PM

I'm using the default one that comes with the jupyter extension

Nok Lam Chan

10/03/2024, 4:38 PM

Thanks, that rings a bell. https://github.com/kedro-org/kedro/issues/2942

Nicolas Betancourt Cardona

10/03/2024, 4:38 PM

Runing

os.chdir("/path/to/kedro/project")

fixed the problem

Nok Lam Chan

10/03/2024, 4:39 PM

Kedro do the best effort to do these path conversation.

Nok Lam Chan

10/03/2024, 4:40 PM

The problem here is that, when you are using relative path in Python, it's always relative to your working directory. That mean when you run a notebook, your working directory is in `project/notebooks`(I guess that where your notebook are)

Nok Lam Chan

10/03/2024, 4:41 PM

We try to detect some keywords to do conversion automatically

Copy code

conf_keys_with_filepath = ("filename", "filepath", "path")

But in your case the conversion didn't happen. So you will likely have to handle that yourself.

Nok Lam Chan

10/03/2024, 4:42 PM

You can find the logic here: https://github.com/kedro-org/kedro/blob/f1d37513097471fa868e0b1e0d917c1ba7c35894/kedro/framework/context/context.py#L59 Unfortunately there is no easy way for you to just extend that keywords list, so this has to go into your dataset implementation.

Nicolas Betancourt Cardona

10/03/2024, 4:46 PM

This helped me alot. Thank you very much @Nok Lam Chan, you are always so nice :)

❤️ 2

2 Views

Open in Slack

Previous Next