Gabriel Aguiar
04/09/2024, 2:23 PMPickleDataSet
Class Not Found Error in Kedro Project
Hello Kedro Community,
I'm currently working on a Kedro project and encountered an error when defining a dataset in my catalog.yml
file. The error message I receive is as follows:
DatasetError: An exception occurred when parsing config for dataset 'df_us_5': Class 'pickle.PickleDataSet' not found, is this a typo?
This error arises when I try to configure a dataset of type pickle.PickleDataSet
in my data catalog. Here is the configuration I used:
"df_{name_us}":
type: pickle.PickleDataSet
filepath: data/02_intermediate/df_{name_us}.joblib
I've also tried using the fully qualified name kedro.extras.datasets.pickle.PickleDataSet
, but then I encounter a similar error message:
DatasetError: An exception occurred when parsing config for dataset 'df_us_5': Class 'kedro.extras.datasets.pickle.PickleDataSet' not found, is this a typo?
I already tried using Dataset and DataSet
For context, I am using Kedro version 0.19.3
and python 3.10.13
Has anyone encountered this issue before, or does anyone know if there have been any recent changes to how datasets of type PickleDataSet
should be defined in the data catalog? Any insights or suggestions on how to resolve this issue would be greatly appreciated.
Thank you in advance for your help!datajoely
04/09/2024, 2:23 PMGabriel Aguiar
04/09/2024, 2:24 PMdatajoely
04/09/2024, 2:26 PMdatajoely
04/09/2024, 2:26 PMGabriel Aguiar
04/09/2024, 2:27 PM(peloptimize) C:\Dev\kedro_pelopt\sentinela-palletizing\peloptmize>kedro run --pipeline data_processing
[04/09/24 11:26:53] INFO Kedro project peloptmize session.py:321
[04/09/24 11:26:54] INFO Using synchronous mode for loading and saving data. Use the --async sequential_runner.py:64
flag for potential performance gains.
<https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline>
.html#load-and-save-asynchronously
WARNING No nodes ran. Repeat the previous command to attempt a new run. runner.py:214
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 152, in from_config
class_obj, config = parse_dataset_definition(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 405, in parse_dataset_definition
raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
<http://kedro.io|kedro.io>.core.DatasetError: Class 'pickle.PickleDataset' not found, is this a typo?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\Scripts\kedro.exe\__main__.py", line 7, in <module>
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 198, in main
cli_collection()
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 127, in main
super().main(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\project.py", line 225, in run
session.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\session\session.py", line 392, in run
run_result = runner.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 117, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id) # type: ignore[arg-type]
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\sequential_runner.py", line 75, in _run
run_node(node, catalog, hook_manager, self._is_async, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 331, in run_node
node = _run_node_sequential(node, catalog, hook_manager, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 412, in _run_node_sequential
inputs[name] = catalog.load(name)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 481, in load
dataset = self._get_dataset(name, version=load_version)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 380, in _get_dataset
dataset = AbstractDataset.from_config(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 156, in from_config
raise DatasetError(
<http://kedro.io|kedro.io>.core.DatasetError: An exception occurred when parsing config for dataset 'df_us_5':
Class 'pickle.PickleDataset' not found, is this a typo?
datajoely
04/09/2024, 2:30 PMdatajoely
04/09/2024, 2:31 PMengine = joblib
Gabriel Aguiar
04/09/2024, 2:32 PMjoblib 1.3.2
Gabriel Aguiar
04/09/2024, 2:33 PMdatajoely
04/09/2024, 2:33 PMGabriel Aguiar
04/09/2024, 2:33 PM(peloptimize) C:\Dev\kedro_pelopt\sentinela-palletizing\peloptmize>kedro run --pipeline data_processing
[04/09/24 11:33:18] INFO Kedro project peloptmize session.py:321
INFO Using synchronous mode for loading and saving data. Use the --async sequential_runner.py:64
flag for potential performance gains.
<https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline>
.html#load-and-save-asynchronously
WARNING No nodes ran. Repeat the previous command to attempt a new run. runner.py:214
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 152, in from_config
class_obj, config = parse_dataset_definition(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 405, in parse_dataset_definition
raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
<http://kedro.io|kedro.io>.core.DatasetError: Class 'pickle.PickleDataset' not found, is this a typo?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\Scripts\kedro.exe\__main__.py", line 7, in <module>
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 198, in main
cli_collection()
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\cli.py", line 127, in main
super().main(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\cli\project.py", line 225, in run
session.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\framework\session\session.py", line 392, in run
run_result = runner.run(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 117, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id) # type: ignore[arg-type]
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\sequential_runner.py", line 75, in _run
run_node(node, catalog, hook_manager, self._is_async, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 331, in run_node
node = _run_node_sequential(node, catalog, hook_manager, session_id)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\runner\runner.py", line 412, in _run_node_sequential
inputs[name] = catalog.load(name)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 481, in load
dataset = self._get_dataset(name, version=load_version)
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\data_catalog.py", line 380, in _get_dataset
dataset = AbstractDataset.from_config(
File "C:\Users\gabriel.gomes\AppData\Local\anaconda3\envs\peloptimize\lib\site-packages\kedro\io\core.py", line 156, in from_config
raise DatasetError(
<http://kedro.io|kedro.io>.core.DatasetError: An exception occurred when parsing config for dataset 'df_us_5':
Class 'pickle.PickleDataset' not found, is this a typo?
Gabriel Aguiar
04/09/2024, 2:33 PMdatajoely
04/09/2024, 2:34 PMdatajoely
04/09/2024, 2:34 PMfrom kedro.datasets.pickle import PickleDataset
Gabriel Aguiar
04/09/2024, 2:35 PM---------------------------------------------------------------------------
*ModuleNotFoundError* Traceback (most recent call last)
Cell *In[1], line 1*
*----> 1 from* kedro.datasets.pickle import PickleDataset
*ModuleNotFoundError*: No module named 'kedro.datasets'
Gabriel Aguiar
04/09/2024, 2:35 PMGabriel Aguiar
04/09/2024, 2:36 PMGabriel Aguiar
04/09/2024, 2:36 PMpip install kedro-datasets
datajoely
04/09/2024, 2:36 PMdatajoely
04/09/2024, 2:36 PMdatajoely
04/09/2024, 2:36 PMpip install "kedro-datasets[pickle.PickelDataset]"
Gabriel Aguiar
04/09/2024, 2:38 PM# 02 Intermediate
"df_{name_us}":
type: pickle.PickleDataset
filepath: data/02_intermediate/df_{name_us}.joblib
engine: joblib
<http://kedro.io|kedro.io>.core.DatasetError:
PickleDataset.__init__() got an unexpected keyword argument 'engine'.
Dataset 'df_us_5' must only contain arguments valid for the constructor of 'kedro_datasets.pickle.pickle_dataset.PickleDataset'.
Gabriel Aguiar
04/09/2024, 2:39 PMdatajoely
04/09/2024, 2:42 PMdatajoely
04/09/2024, 2:42 PMNok Lam Chan
04/09/2024, 3:25 PMNok Lam Chan
04/09/2024, 3:26 PMNok Lam Chan
04/09/2024, 3:27 PM_DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", ""]
In 0.19, kedro do not look for kedro.extras
anymore
DatasetError: An exception occurred when parsing config for dataset 'df_us_5': Class 'kedro.extras.datasets.pickle.PickleDataSet' not found, is this a typo?