Swamini Khurana
06/26/2024, 8:04 AMcatalog.yml
file, I would like to specify load_args
to load a dataset where the load_args
are taken from a yaml file written during the execution of a different node. e.g.
dummy_dataset:
filepath: /path/to/dataset
type: some_spatial_dataset
load_args:
dummy_arg: <4 element tuple from dummy_arg.yml>
My understanding was that I could do this using Hooks, where I could specify that this should be done prior to loading of dummy_dataset by specifying before_dataset_loaded
(example seen here: https://docs.kedro.org/en/stable/hooks/examples.html)
Considerations: The dataset type is an abstract dataset for geospatial datasets/vector files
Question for kedro team:
• Is this possible to execute?
• Are we approaching this correctly or have we missed something?marrrcin
06/26/2024, 8:26 AMclass LazyDataSet(AbstractDataSet):
# constructors and other stuff
def _load():
def lazy_loader(path):
return PickleDataSet(path).load()
return lazy_loader
And then you do this in 2 nodes:
1. node(inputs="from_sql_query", func=<extract the path you need>, outputs="path_you_need")
2. node(inputs=["path_you_need", "lazy_dataset"], lambda path, lazy: lazy(path))
Swamini Khurana
06/26/2024, 9:51 AMmarrrcin
06/26/2024, 9:56 AMElena Khaustova
06/26/2024, 10:50 AMload_args
of dummy_dataset
at the runtime, so you want to redefine the parameters set in the catalog.yml
?Swamini Khurana
06/26/2024, 10:58 AMElena Khaustova
06/26/2024, 11:24 AMcatalog
as input. So, you will need to access the target parameters in the catalog via the private _datasets
property and modify them. Since we do not recommend doing this - there are no public methods for that. You will need to look at the DataCatalog implementation to be able to do that.
Here is an example of injecting the dynamic behaviour for the mlflow plugin.Swamini Khurana
06/26/2024, 1:01 PMElena Khaustova
06/26/2024, 1:11 PM