*Issue: `PickleDataSet` Class Not Found Error in K...
# questions
g
Issue:
PickleDataSet
Class Not Found Error in Kedro Project
Hello Kedro Community, I'm currently working on a Kedro project and encountered an error when defining a dataset in my
catalog.yml
file. The error message I receive is as follows:
DatasetError: An exception occurred when parsing config for dataset 'df_us_5': Class 'pickle.PickleDataSet' not found, is this a typo?
This error arises when I try to configure a dataset of type
pickle.PickleDataSet
in my data catalog. Here is the configuration I used:
"df_{name_us}":
type: pickle.PickleDataSet
filepath: data/02_intermediate/df_{name_us}.joblib
I've also tried using the fully qualified name
kedro.extras.datasets.pickle.PickleDataSet
, but then I encounter a similar error message:
Copy code
DatasetError: An exception occurred when parsing config for dataset 'df_us_5': Class 'kedro.extras.datasets.pickle.PickleDataSet' not found, is this a typo?
I already tried using Dataset and DataSet For context, I am using Kedro version
0.19.3
and python
3.10.13
Has anyone encountered this issue before, or does anyone know if there have been any recent changes to how datasets of type
PickleDataSet
should be defined in the data catalog? Any insights or suggestions on how to resolve this issue would be greatly appreciated. Thank you in advance for your help!
K 1
d
It’s now lowercase Dataset
g
I already tried using Dataset and DataSet
d
so on 0.19.x it should be lowercase now
👍 1
can you paste the full error when you make it lowercase?
👍 1
g
(peloptimize) C:\Dev\kedro_pelopt\sentinela-palletizing\peloptmize>kedro run --pipeline data_processing
[04/09/24 11:26:53] INFO     Kedro project peloptmize                                                     session.py:321
[04/09/24 11:26:54] INFO     Using synchronous mode for loading and saving data. Use the --async sequential_runner.py:64
flag for potential performance gains.
<https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline>
.html#load-and-save-asynchronously
WARNING  No nodes ran. Repeat the previous command to attempt a new run.               runner.py:214
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 152, in from_config
class_obj, config = parse_dataset_definition(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 405, in parse_dataset_definition
raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
<http://kedro.io|kedro.io>.core.DatasetError: Class 'pickle.PickleDataset' not found, is this a typo?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\Scripts\kedro.exe\__main__.py", line 7, in <module>
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 198, in main
cli_collection()
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 127, in main
super().main(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\project.py", line 225, in run
session.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\session\session.py", line 392, in run
run_result = runner.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 117, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id)  # type: ignore[arg-type]
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\sequential_runner.py", line 75, in _run
run_node(node, catalog, hook_manager, self._is_async, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 331, in run_node
node = _run_node_sequential(node, catalog, hook_manager, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 412, in _run_node_sequential
inputs[name] = catalog.load(name)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 481, in load
dataset = self._get_dataset(name, version=load_version)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 380, in _get_dataset
dataset = AbstractDataset.from_config(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 156, in from_config
raise DatasetError(
<http://kedro.io|kedro.io>.core.DatasetError: An exception occurred when parsing config for dataset 'df_us_5':
Class 'pickle.PickleDataset' not found, is this a typo?
d
do you have joblib installed? I appreciate it’s a confusing error if so
oh and have you provided
engine = joblib
g
joblib                            1.3.2
How i can use engine = joblib? Like this: "df_{name_us}": type: pickle.PickleDataset filepath: data/02_intermediate/df_{name_us}.jobli engine: joblib ?
d
exactly
g
(peloptimize) C:\Dev\kedro_pelopt\sentinela-palletizing\peloptmize>kedro run --pipeline data_processing
[04/09/24 11:33:18] INFO     Kedro project peloptmize                                                     session.py:321
INFO     Using synchronous mode for loading and saving data. Use the --async sequential_runner.py:64
flag for potential performance gains.
<https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline>
.html#load-and-save-asynchronously
WARNING  No nodes ran. Repeat the previous command to attempt a new run.               runner.py:214
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 152, in from_config
class_obj, config = parse_dataset_definition(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 405, in parse_dataset_definition
raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
<http://kedro.io|kedro.io>.core.DatasetError: Class 'pickle.PickleDataset' not found, is this a typo?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\Scripts\kedro.exe\__main__.py", line 7, in <module>
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 198, in main
cli_collection()
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 127, in main
super().main(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\project.py", line 225, in run
session.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\session\session.py", line 392, in run
run_result = runner.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 117, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id)  # type: ignore[arg-type]
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\sequential_runner.py", line 75, in _run
run_node(node, catalog, hook_manager, self._is_async, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 331, in run_node
node = _run_node_sequential(node, catalog, hook_manager, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 412, in _run_node_sequential
inputs[name] = catalog.load(name)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 481, in load
dataset = self._get_dataset(name, version=load_version)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 380, in _get_dataset
dataset = AbstractDataset.from_config(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 156, in from_config
raise DatasetError(
<http://kedro.io|kedro.io>.core.DatasetError: An exception occurred when parsing config for dataset 'df_us_5':
Class 'pickle.PickleDataset' not found, is this a typo?
Same error
d
hmm
In an jupyter notebook - can you do the following?
from kedro.datasets.pickle import PickleDataset
g
---------------------------------------------------------------------------
*ModuleNotFoundError*                       Traceback (most recent call last)
Cell *In[1], line 1*
*----> 1 from* kedro.datasets.pickle import PickleDataset
*ModuleNotFoundError*: No module named 'kedro.datasets'
(I get this errro running: from kedro.datasets.pickle import PickleDataset)
Now i see
I have to do:
Copy code
pip install kedro-datasets
d
oh
gotcha
and if you installed
pip install "kedro-datasets[pickle.PickelDataset]"
❤️ 1
g
# 02 Intermediate
"df_{name_us}":
type: pickle.PickleDataset
filepath: data/02_intermediate/df_{name_us}.joblib
engine: joblib
<http://kedro.io|kedro.io>.core.DatasetError:
PickleDataset.__init__() got an unexpected keyword argument 'engine'.
Dataset 'df_us_5' must only contain arguments valid for the constructor of 'kedro_datasets.pickle.pickle_dataset.PickleDataset'.
I used backend: joblib instead engine: joblib and it is working 🙂 Thank you very much @datajoely
d
💪
❤️ 1
I think we could do a better job with that error message though
💡 1
n
May I ask what Kedro version are you using?
Ops I saw it's 0.19.3
I wonder where is your error coming from.
Copy code
_DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", ""]
In 0.19, kedro do not look for
kedro.extras
anymore
DatasetError: An exception occurred when parsing config for dataset 'df_us_5': Class 'kedro.extras.datasets.pickle.PickleDataSet' not found, is this a typo?