Is it possible to import an already packaged Kedro...
# questions
Is it possible to import an already packaged Kedro pipeline in a separate script and assign node return values to new variables for use later in the script? I've been trying to get people on our team on board with Kedro and a couple of us would be really interested in being able to use
returned by nodes as pieces of larger scripts. Up until now, I've only needed to import
and that has worked for our purposes so far
Kedro pipeline should return a dictionary of dataset and you can consume it.
So I've only been able to call
in whatever script I put it in until this point which exits the script. Not sure what I'm doing wrong, but I saw old documentation and examples of others doing it but it was around ver=~17 and things like
were still around (even though that case wouldn't be a package)
Do you still have control over that script? If it is not returning anything there isn't much you can do about it to change an existing program
Yeah what I've been experimenting with is a simple anomaly detection pipeline that I wrote up. Originally, I only needed csv dumps of dataframes and that's been working great. But putting it in a separate script, I want to avoid the file io, so I added a memorydataset return of the same dataframe. But I'm assuming I'm calling the packaged pipeline incorrectly because the print here won't even execute:
Copy code
from my_pipeline.__main__ import main
I haven't addressed the issue of getting the return value from the pipeline
Do you have the definition of your main and can you post here? Also which version of kedro you are on? I think I roughly know what's happening and this is something I am eager to fix to make integrating kedro easier. may shows some light about what's going on, I will try to find more time to look at this tomorrow.
That issue is pretty much spot on I think. Something like that would be awesome. This pipeline was written using kedro 0.18.8 but I've since upgraded to using 0.18.9 with no issues
is the default main that is generated when you create a new kedro project with
kedro new
and hasn't been modified in any way. Unless it get's changed when you do a
kedro package
? I'll check though
Although, I don't necessarily need to run the pipeline in a script using
. It looks like the session solution you mentioned in that github issue will solve my problem though? Just import
Also a diff of my "post-packaged"
and the one generated when the project is created yields no differences:
Copy code
import importlib
from pathlib import Path

from kedro.framework.cli.utils import KedroCliError, load_entry_points
from kedro.framework.project import configure_project

def _find_run_command(package_name):
        project_cli = importlib.import_module(f"{package_name}.cli")
        # fail gracefully if does not exist
    except ModuleNotFoundError as exc:
        if f"{package_name}.cli" not in str(exc):
        plugins = load_entry_points("project")
        run = _find_run_command_in_plugins(plugins) if plugins else None
        if run:
            # use run command from installed plugin if it exists
            return run
        # use run command from `kedro.framework.cli.project`
        from kedro.framework.cli.project import run

        return run
    # fail badly if exists, but has no `cli` in it
    if not hasattr(project_cli, "cli"):
        raise KedroCliError(f"Cannot load commands from {package_name}.cli")

def _find_run_command_in_plugins(plugins):
    for group in plugins:
        if "run" in group.commands:
            return group.commands["run"]

def main(*args, **kwargs):
    package_name = Path(__file__)
    run = _find_run_command(package_name)
    run(*args, **kwargs)

if __name__ == "__main__":
I think for now you need to do the KedroSession way, which is similar to the Databrick’s workflow because Databricks doesn’t like
Copy code
from kedro.framework.project import configure_project
from kedro.framework.session import KedroSession

with KedroSession.create(env=env, conf_source=conf_source) as session:
        result =  # result is a dict of result that you are interested