Hey team, trying to implement a very simple custom...
# questions
a
Hey team, trying to implement a very simple custom dataset which loops through a directory reading in specific csvs that match a string pattern as pandas DataFrames performing basic cleaning operations on the individual DataFrames and concatenating them together. class definition is located:
Copy code
src/<my_project>/extras
├── __init__.py
└── datasets
    ├── __init__.py
    └── <my_custom_dataset>.py
catalog entry:
Copy code
raw_custom_dataset:
  type: <my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>
  filepath: 01_raw/folder/*
when I run the node keep getting the following error:
Copy code
An exception occurred when parsing config for DataSet 'raw_custom_dataset':
Class '<my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>' not found or one of
its dependencies has not been installed.
Kedro =0.18.3
d
Can you do
from <my_project>.extras.datasets.<my_custom___dataset> import <MyCustomDataSet>
?
Maybe you just have a missing dependency or other error and the error message is confusing
a
@Deepyaman Datta I am able to import the dataset and define an instance of the dataset however the catalog entry throws the error when trying to parse, wondering if anyone has insight as to which dependancies might be missing 🙂
d
This does basically mean it's not able to import the package/load the object: https://github.com/kedro-org/kedro/blob/0.18.3/kedro/io/core.py#L387-L396 Can you share full stack trace?
a
Resolved after setting version in the init.py files 😅 for the newly created modules.. whoops haha, thanks for your help @Deepyaman Datta
👍 1
d
That’s a nasty one
I’m not sure if it’s a terrible idea to do a check for this sort of thing and give users a helpful error message or not?
@Deepyaman Datta what are your thoughts?
d
TBH I didn't understand how setting version fixed this? 😅
d
I was more about detecting if init.py is present when failing to load a dataset
d
The structure @Alexander Johns provided did have init.py
d
facepalming now I’m lost too
a
It had the init.py files initially. Only once I declared the version in the init.py to match the rest of the project, I was able to import the custom dataset class.
d
There's no reason most
__init__.py
should need a version
p
Hi @Alexander Johns Do you mind putting the sample of what init.py should be like. Even I’m also facing the same error that you have mentioned even after adding the version number in the init.py files
d
We’re still not sure this is what fixed it - are you getting the same error as above?
p
Yes
But in my case, I have created a custom Athena Dataset Class to load the data from the AWS Athena table
d
in python session / jupyter session can you do actually do an import
from a.b.c import CustomDataSet
p
Thank you for your suggestion @datajoely I was able to find the error. I didn’t install a package in the class I was using because of which the error came. After I installed the package, I was able to use my custom class.
K 1
d
amazing!
163 Views