fmfreeze
05/22/2023, 12:51 PMdef create_pipeline(**kwargs) -> Pipeline:
return pipeline([
node(func=do_stuff, inputs=[], outputs='MyMemDS'),
node(func=do_more_stuff, inputs=['MyMemDS'], outputs='SecondMemDS')
])
I thought my conf/base/catalog.yml
needs the entries:
MyMemDS:
type: MemoryDataSet
SecondMemDS:
type: MemoryDataSet
But when I run the pipeline - which works, also with kedro-viz - it does not utilize catalog.yml
entries at all.
The output of my first node is an empty {}
dictionary and if I rename or delete the entries in catalog.yml
it "works" like before and the first node returns an empty dictionary.
Do I need to register the catalog anywhere? I simply want to access the object which is returned by my do_stuff()
function.
What am I missing out?FlorianGD
05/22/2023, 1:05 PMMemoryDataSet
s, Kedro instantiates it for you when it encounters a dataset not present in your catalog.fmfreeze
05/22/2023, 1:10 PMsession.run(to_outputs=["MyMemDS"])
in kedro ipython, it outputs only {}
althoug do_stuff()
returns something (which is not a dict at all)?FlorianGD
05/22/2023, 1:11 PMfmfreeze
05/22/2023, 1:13 PMFlorianGD
05/22/2023, 1:13 PMfmfreeze
05/22/2023, 1:18 PMdef do_stuff():
output = ["this", "is", "my", "output"]
return output
session.run(to_outputs=["MyMemDS"]
would return ["this", "is", "my", "output"]
FlorianGD
05/22/2023, 1:27 PMpipeline_registry.py
In [1]: session.run(pipeline_name="name", to_outputs=["MyMemDS"])
[05/22/23 15:26:16] INFO Kedro project kedro-nouveau session.py:360
INFO Running node: do_stuff(None) -> [MyMemDS] node.py:329
INFO Saving data to 'MyMemDS' (MemoryDataSet)... data_catalog.py:382
INFO Completed 1 out of 1 tasks sequential_runner.py:85
INFO Pipeline execution completed successfully. runner.py:93
INFO Loading data from 'MyMemDS' (MemoryDataSet)... data_catalog.py:343
Out[1]: {'MyMemDS': ['this', 'is', 'my', 'output']}
I think you are trying to run the __default__
pipeline if you call run
without a pipeline name, and maybe it is emptypipeline_registry.py
:
def register_pipelines() -> Dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from a pipeline name to a ``Pipeline`` object.
"""
return {
"name": create_pipeline(),
"__default__": create_pipeline(),
}
you can also run it without providing the pipeline_name
fmfreeze
05/22/2023, 1:35 PMpipeline_registry.py
looked like this:
def register_pipelines() -> Dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from pipeline names to ``Pipeline`` objects.
"""
# pipelines = find_pipelines()
my_pipeline = create_pipeline()
pipelines = {"__default__": my_pipeline}
return pipelines
session.run()
it does not output the task/INFO logs.
I think something is not configured properly, but I don't know where to search for the root of the problem.FlorianGD
05/22/2023, 1:41 PMkedro new
you should have a working config out of the boxfmfreeze
05/22/2023, 1:44 PMkedro new
but then I copied the files into the folderstructure I have to use in my company as stated in https://kedro-org.slack.com/archives/C03RKP2LW64/p1683048150213349
This seemed to work fine, except the issue I described above.Juan Luis
05/23/2023, 6:19 PMfmfreeze
05/26/2023, 12:21 PMJuan Luis
05/26/2023, 12:24 PM