https://kedro.org/ logo
#questions
Title
# questions
a

Alexander Johns

02/15/2023, 6:19 PM
Hey team, trying to implement a very simple custom dataset which loops through a directory reading in specific csvs that match a string pattern as pandas DataFrames performing basic cleaning operations on the individual DataFrames and concatenating them together. class definition is located:
Copy code
src/<my_project>/extras
├── __init__.py
└── datasets
    ├── __init__.py
    └── <my_custom_dataset>.py
catalog entry:
Copy code
raw_custom_dataset:
  type: <my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>
  filepath: 01_raw/folder/*
when I run the node keep getting the following error:
Copy code
An exception occurred when parsing config for DataSet 'raw_custom_dataset':
Class '<my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>' not found or one of
its dependencies has not been installed.
Kedro =0.18.3
d

Deepyaman Datta

02/15/2023, 6:50 PM
Can you do
from <my_project>.extras.datasets.<my_custom___dataset> import <MyCustomDataSet>
?
Maybe you just have a missing dependency or other error and the error message is confusing
a

Alexander Johns

02/15/2023, 7:13 PM
@Deepyaman Datta I am able to import the dataset and define an instance of the dataset however the catalog entry throws the error when trying to parse, wondering if anyone has insight as to which dependancies might be missing 🙂
d

Deepyaman Datta

02/15/2023, 11:15 PM
This does basically mean it's not able to import the package/load the object: https://github.com/kedro-org/kedro/blob/0.18.3/kedro/io/core.py#L387-L396 Can you share full stack trace?
a

Alexander Johns

02/15/2023, 11:19 PM
Resolved after setting version in the init.py files 😅 for the newly created modules.. whoops haha, thanks for your help @Deepyaman Datta
👍 1
d

datajoely

02/16/2023, 10:26 AM
That’s a nasty one
I’m not sure if it’s a terrible idea to do a check for this sort of thing and give users a helpful error message or not?
@Deepyaman Datta what are your thoughts?
d

Deepyaman Datta

02/16/2023, 2:24 PM
TBH I didn't understand how setting version fixed this? 😅
d

datajoely

02/16/2023, 2:26 PM
I was more about detecting if init.py is present when failing to load a dataset
d

Deepyaman Datta

02/16/2023, 2:27 PM
The structure @Alexander Johns provided did have init.py
d

datajoely

02/16/2023, 2:55 PM
facepalming now I’m lost too
a

Alexander Johns

02/16/2023, 2:59 PM
It had the init.py files initially. Only once I declared the version in the init.py to match the rest of the project, I was able to import the custom dataset class.
d

Deepyaman Datta

02/16/2023, 3:13 PM
There's no reason most
__init__.py
should need a version
p

Praneeth Nooli

02/21/2023, 7:26 AM
Hi @Alexander Johns Do you mind putting the sample of what init.py should be like. Even I’m also facing the same error that you have mentioned even after adding the version number in the init.py files
d

datajoely

02/21/2023, 9:57 AM
We’re still not sure this is what fixed it - are you getting the same error as above?
p

Praneeth Nooli

02/21/2023, 1:34 PM
Yes
But in my case, I have created a custom Athena Dataset Class to load the data from the AWS Athena table
d

datajoely

02/21/2023, 1:49 PM
in python session / jupyter session can you do actually do an import
from a.b.c import CustomDataSet
p

Praneeth Nooli

02/21/2023, 3:37 PM
Thank you for your suggestion @datajoely I was able to find the error. I didn’t install a package in the class I was using because of which the error came. After I installed the package, I was able to use my custom class.
K 1
d

datajoely

02/21/2023, 3:37 PM
amazing!
15 Views