I hope this is a simple question and I am just mis...
# questions
f
I hope this is a simple question and I am just missing out a basic configuration: When I write a simple pipeline like:
Copy code
def create_pipeline(**kwargs) -> Pipeline:
    return pipeline([
        node(func=do_stuff, inputs=[], outputs='MyMemDS'),
        node(func=do_more_stuff, inputs=['MyMemDS'], outputs='SecondMemDS')
    ])
I thought my
conf/base/catalog.yml
needs the entries:
Copy code
MyMemDS:
  type: MemoryDataSet
SecondMemDS:
  type: MemoryDataSet
But when I run the pipeline - which works, also with kedro-viz - it does not utilize
catalog.yml
entries at all. The output of my first node is an empty
{}
dictionary and if I rename or delete the entries in
catalog.yml
it "works" like before and the first node returns an empty dictionary. Do I need to register the catalog anywhere? I simply want to access the object which is returned by my
do_stuff()
function. What am I missing out?
f
You don't have to define the
MemoryDataSet
s, Kedro instantiates it for you when it encounters a dataset not present in your catalog.
If you want to persist the data, then an entry in the catalog is mandatory
f
Thank you @FlorianGD, so obviously it is not about a wrong catalog configuration. Still, why does the node return an empty dictionary? When I run
session.run(to_outputs=["MyMemDS"])
in kedro ipython, it outputs only
{}
althoug
do_stuff()
returns something (which is not a dict at all)?
f
are you sure that your function outputs something?
f
If you mean if it returns something, then yes. Directly before the return statement I log the type of the return value.
f
Could you show the code?
f
For my minimal example it is only:
Copy code
def do_stuff():
  output = ["this", "is", "my", "output"]
  return output
I'd expect - in a kedro ipython session - that
session.run(to_outputs=["MyMemDS"]
would return
["this", "is", "my", "output"]
f
I cannot reproduce, I added a pipeline called "name" in
pipeline_registry.py
Copy code
In [1]: session.run(pipeline_name="name", to_outputs=["MyMemDS"])
[05/22/23 15:26:16] INFO     Kedro project kedro-nouveau                               session.py:360
                    INFO     Running node: do_stuff(None) -> [MyMemDS]                    node.py:329
                    INFO     Saving data to 'MyMemDS' (MemoryDataSet)...          data_catalog.py:382
                    INFO     Completed 1 out of 1 tasks                       sequential_runner.py:85
                    INFO     Pipeline execution completed successfully.                  runner.py:93
                    INFO     Loading data from 'MyMemDS' (MemoryDataSet)...       data_catalog.py:343
Out[1]: {'MyMemDS': ['this', 'is', 'my', 'output']}
I think you are trying to run the
__default__
pipeline if you call
run
without a pipeline name, and maybe it is empty
With this in
pipeline_registry.py
:
Copy code
def register_pipelines() -> Dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from a pipeline name to a ``Pipeline`` object.
    """
    return {
        "name": create_pipeline(),
        "__default__": create_pipeline(),
    }
you can also run it without providing the
pipeline_name
f
interesting... my
pipeline_registry.py
looked like this:
Copy code
def register_pipelines() -> Dict[str, Pipeline]:
 """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    # pipelines = find_pipelines()
    my_pipeline = create_pipeline()
    pipelines = {"__default__": my_pipeline}
    return pipelines
further, if i run
session.run()
it does not output the task/INFO logs. I think something is not configured properly, but I don't know where to search for the root of the problem.
f
How did you create the project? With
kedro new
you should have a working config out of the box
f
more or less: I created a project with
kedro new
but then I copied the files into the folderstructure I have to use in my company as stated in https://kedro-org.slack.com/archives/C03RKP2LW64/p1683048150213349 This seemed to work fine, except the issue I described above.
j
hi @fmfreeze, did you manage to solve the issue? (also thanks a lot @FlorianGD for chiming in!)
f
just to document: This issue disappeared, once I deleted the same named MemoryDatasets form the catalog.yml (and restarted my IDE).
j
thanks for getting back, glad the issue went away!