Hi all! I have worked with kedro many times in dif...
# questions
n
Hi all! I have worked with kedro many times in different operating systems and I have never had issues with catalog path entries. It has always been fine to define catalog entries such like
Copy code
catalog_entry:
  type: AnyDataset
  filepath: data/01_raw/file.extension
whether on windows or mac. Now I'm having an issue with it for the first time. It turns out that the following catalog entry
Copy code
problematic_catalog_entry
  type: MyCustomDataSet
  mainfolderpath: data/01_raw/file.extension
rises a
winerror 3 the system cannot find the path specified
when loaded from a Kedro Jupyter Notebook but
Copy code
problematic_catalog_entry_2
  type: MyCustomDataSet
  mainfolderpath: C:\same\path\but\absolute\data\01_raw\file.extension
doesn't. This is absolutely my fault because the data set type I'm using is a custom
AbstractDataset
but I don't have this problem with other custom
AbstractDataset
. I will attach my
_load
method because the problem might be there
Copy code
def _load(self):
        subfolder_names=[ subfolder_name 
                         for subfolder_name in os.listdir(self._mainfolderpath) 
                         if os.path.isdir(os.path.join(self._mainfolderpath, subfolder_name)) 
                        ]
        
        
        wav_paths_dict={}
        for subfolder_name in subfolder_names:
            subfolder_path=os.path.join(self._mainfolderpath, subfolder_name)
            wav_files=[]
            for root, dirs, files in os.walk(subfolder_path):
                for file in files:
                    if file.lower().endswith('.wav'):
                        wav_file_path=os.path.join(root, file)
                        wav_file_name=os.path.split(wav_file_path)[-1].replace('.wav','').replace('.WAV','')
                        wav_files.append((wav_file_name,wav_file_path))
                wav_paths_dict[subfolder_name]=dict(wav_files)

        
        partitioned_dataset_dict={}
        for subfolder_name, sub_dict in wav_paths_dict.items():
            partitioned_dataset=[(wav_file_name,SoundDataset(wav_file_path).load()) for wav_file_name,wav_file_path in sub_dict.items()]
            partitioned_dataset_dict[subfolder_name]=dict(partitioned_dataset)
        
        return partitioned_dataset_dict
On
__init__
I'm initializing
self._mainfolderpath
this way:
self._mainfolderpath = PurePosixPath(mainfolderpath)
. Thank you very much for yor help again
n
Is it possible to create an minimal example? https://stackoverflow.com/help/minimal-reproducible-example
Is the problem that it handle relative path but fail to process the Windows path? > rises a
winerror 3 the system cannot find the path specified
when loaded from a Kedro Jupyter Notebook but Which lines of code give you this error? You should be able to tell from the stacktrace, or simply print out the path.
n
It seems that the problem is only in Jupyter. The line of code that rises the error is
catalog.load('problematic_catalog_entry')
in a kedro jupyter notebook (this is the catalog entry with the relative path). Meanwhile the line
catalog.load('problematic_catalog_entry_2')
do not rises an error. I just ran
kedro run --node test_node
from my terminal, where test_node has
problematic_catalog_entry
as input and it does not rises an error. This is the same catalog entry that rises an error on jjupyter
n
How did you create the catalog? or you are using the default one comes with the extension? For example you can do
%load_ext kedro.ipython
, that should load up a global
catalog
for you.
n
I'm using the default one that comes with the jupyter extension
n
n
Runing
os.chdir("/path/to/kedro/project")
fixed the problem
n
Kedro do the best effort to do these path conversation.
The problem here is that, when you are using relative path in Python, it's always relative to your working directory. That mean when you run a notebook, your working directory is in `project/notebooks`(I guess that where your notebook are)
We try to detect some keywords to do conversion automatically
Copy code
conf_keys_with_filepath = ("filename", "filepath", "path")
But in your case the conversion didn't happen. So you will likely have to handle that yourself.
You can find the logic here: https://github.com/kedro-org/kedro/blob/f1d37513097471fa868e0b1e0d917c1ba7c35894/kedro/framework/context/context.py#L59 Unfortunately there is no easy way for you to just extend that keywords list, so this has to go into your dataset implementation.
n
This helped me alot. Thank you very much @Nok Lam Chan, you are always so nice :)
❤️ 2