Philipp Dahlke
01/10/2025, 9:55 PMkedro_mlfow
to store kedro_datasets_experimental.netcdf
as artifacts. Unfortunatly I can't make it work.
The problem seems to be path related:
kedro.io.core.DatasetError:
Failed while saving data to dataset MlflowNetCDFDataset(filepath=S:/…/data/07_model_output/D2-24-25/idata.nc, load_args={'decode_times': False}, protocol=file, save_args={'mode': w}).
'str' object has no attribute 'as_posix'
I tried to investigate it to the best of my abilities and it seems to have to do with the initialization of NetCDFDataset
. Most Datasets inherit from AbstractVersionedDataset
and will call __init__
with its _filepath as str.
NetCDFDataset
is missing it and so the PurePosixPath
is not created. If this should be the problem in the end I don’t know but it is the point where other datasets have its path set. In the meantime I thought it might be because mlflow isn't capable of tracking Datasets which don't inherit from AbstractVersionedDataset
but in kedro-mlfow documentation it says MlflowArtifactDataset
is a wrapper for all AbstractDatasets
.
I tried to set the self._filepath = PurePosixPath(filepath)
myself in the sitepackage but getting a Permission error on saving and that’s were my journey has to end. Would have been too good if this oneline would have made it^^
Thank you guys for your help
here some reduced code for what I'm trying to achive.
catalog.yml
"{dataset}.idata":
type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
dataset:
type: kedro_datasets_experimental.netcdf.NetCDFDataset
filepath: data/07_model_output/{dataset}/idata.nc
save_args:
mode: a
load_args:
decode_times: False
node.py
def predict(model, x_data):
idata = model.predict(x_data)
return az.convert_to_dataset(idata)
pipeline.py
pipeline_inference = pipeline(
[
node(
func=predict,
inputs={
"model": f"{dataset}.model",
"x_data": f"{dataset}.x_data",
},
outputs=f"{dataset}.idata",
name=f"{dataset}.predict_node",
tags=["training"],
),
]
)
Hall
01/10/2025, 9:55 PMJuan Luis
01/13/2025, 7:17 AMMlflowNetCDFDataset
a custom dataset you created? (from the first error you reported)
2. when you used NetCDFDataset
inside MlflowArtifactDataset
(second code snippet), what error did you get? Could you share the full traceback?Yolan Honoré-Rougé
01/13/2025, 12:24 PMNetCDFDataset
should convert the filepath to a Path. Can you :
1. try to use pathlib.Path
instead of pathlib.PurePosixPath
and see if it works?
2. In case it does not, can you share a minimal reproductible sample of data in the correct format you can load and save with NetCDFDataset
so that I can try on my own ?
@Juan Luis MlflowNetCDFDataset
is created under the hood by the MlflowArtifactDataset
Rashida Kanchwala
01/13/2025, 12:25 PMNetCDFDataset
Riley, is there a reason we don't set the self._filepath = PurePosixPath(filepath)
. If not, can we make the change to the dataset to handle it.Philipp Dahlke
01/13/2025, 3:06 PMNetCDFDataset
with the change made to self._filepath
in NetCDFDataset. __ init __
.
@Yolan Honoré-Rougé
Both versions seem to work. pathlib.Path
and pathlib.PurePosixPath
I declared them either like in other classes after self.metadata
or at the end after self._ismultifile
. I didnt want to disturb the is_multifile logic by creating it before hand but it seems like PurePosixPath
can handle getting its own type passed.
A minimal sample:
import numpy as np
import arviz as az
def test_netCDF():
size = 100
dataset = az.convert_to_inference_data(np.random.randn(size))
return az.convert_to_dataset(dataset)
@Juan Luis
1. As mentioned by Yolan this class is created by mlfow and is not implemented by me
2. see below for both traces
Traceback for missing missing _filepath as instance of Path:
Traceback (most recent call last):
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\io\core.py", line 271, in save
save_func(self, data)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro_mlflow\io\artifacts\mlflow_artifact_dataset.py", line 63, in _save
local_path = local_path.as_posix()
^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'as_posix'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "H:\Programs\Anaconda\envs\.conda_ba_env\Scripts\kedro.exe\main.py", line 7, in <module>
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\cli\cli.py", line 263, in main
cli_collection()
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\cli\cli.py", line 163, in main
super().main(
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\cli\project.py", line 228, in run
return session.run(
^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\session\session.py", line 399, in run
run_result = runner.run(
^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\runner.py", line 113, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\sequential_runner.py", line 85, in _run
).execute()
^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\task.py", line 88, in execute
node = self._run_node_sequential(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\task.py", line 186, in _run_node_sequential
catalog.save(name, data)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\io\data_catalog.py", line 438, in save
dataset.save(data)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\io\core.py", line 276, in save
raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed while saving data to dataset MlflowNetCDFDataset(filepath=S:/___Studium/Bachelor_Arbeit/ba_env/bundesliga/data/07_model_output/D1-24-25/pymc/idata_fit.nc, load_args={'decode_times': False}, protocol=file, save_args={'mode': a}).
'str' object has no attribute 'as_posix'
Traceback for _filepath set to Path but missing folders:
Traceback (most recent call last):
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\file_manager.py", line 211, in _acquire_with_cache_info
file = self._cache[self._key]
~~~~~~~~~~~^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\lru_cache.py", line 56, in __getitem__
value = self._cache[key]
~~~~~~~~~~~^^^^^
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('S:\\___Studium\\Bachelor_Arbeit\\ba_env\\bundesliga\\data\\07_model_output\\D1-24-25\\pymc\\idata_fit.nc',), 'a', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '8aa8dfaa-e6a7-47e2-8b44-b700e528ffb8']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\io\core.py", line 271, in save
save_func(self, data)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro_mlflow\io\artifacts\mlflow_artifact_dataset.py", line 66, in _save
super().save.__wrapped__(self, data)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro_datasets_experimental\netcdf\netcdf_dataset.py", line 172, in save
data.to_netcdf(path=self._filepath, **self._save_args)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\core\dataset.py", line 2372, in to_netcdf
return to_netcdf( # type: ignore[return-value] # mypy cannot resolve the overloads:(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\api.py", line 1856, in to_netcdf
store = store_open(target, mode, format, group, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\netCDF4_.py", line 452, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\netCDF4_.py", line 393, in __init__
self.format = self.ds.data_model
^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\netCDF4_.py", line 461, in ds
return self._acquire()
^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\netCDF4_.py", line 455, in _acquire
with self._manager.acquire_context(needs_lock) as root:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\file_manager.py", line 199, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\xarray\backends\file_manager.py", line 217, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src\\netCDF4\\_netCDF4.pyx", line 2521, in netCDF4._netCDF4.Dataset.__init__
File "src\\netCDF4\\_netCDF4.pyx", line 2158, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: 'S:\\___Studium\\Bachelor_Arbeit\\ba_env\\bundesliga\\data\\07_model_output\\D1-24-25\\pymc\\idata_fit.nc'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "H:\Programs\Anaconda\envs\.conda_ba_env\Scripts\kedro.exe\__main__.py", line 7, in <module>
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\cli\cli.py", line 263, in main
cli_collection()
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\cli\cli.py", line 163, in main
super().main(
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\cli\project.py", line 228, in run
return session.run(
^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\framework\session\session.py", line 399, in run
run_result = runner.run(
^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\runner.py", line 113, in run
self._run(pipeline, catalog, hook_or_null_manager, session_id) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\sequential_runner.py", line 85, in _run
).execute()
^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\task.py", line 88, in execute
node = self._run_node_sequential(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\runner\task.py", line 186, in _run_node_sequential
catalog.save(name, data)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\io\data_catalog.py", line 438, in save
dataset.save(data)
File "H:\Programs\Anaconda\envs\.conda_ba_env\Lib\site-packages\kedro\io\core.py", line 276, in save
raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed while saving data to dataset MlflowNetCDFDataset(filepath=S:/___Studium/Bachelor_Arbeit/ba_env/bundesliga/data/07_model_output/D1-24-25/pymc/idata_fit.nc, load_args={'decode_times': False}, protocol=file, save_args={'mode': a}).
[Errno 13] Permission denied: 'S:\\___Studium\\Bachelor_Arbeit\\ba_env\\bundesliga\\data\\07_model_output\\D1-24-25\\pymc\\idata_fit.nc'