Hi, is there an elegant way to skip nodes in a pip...
# questions
i
Hi, is there an elegant way to skip nodes in a pipeline if there is already an intermediate dataset? Currently I use a node for checking, but it doesn't seem to be the best way:
Copy code
node(
                func=nd.check_intermediate,
                inputs="adatasets#{params:experiment}",
                outputs="intermediate_exists",
                name="check_intermediate",
            ),
I could.... 1. create a master pipeline 2. add an skip node hook 3. any other ideas?
m
What do you mean by "if there is already an intermediate dataset"? What is your use case / what you are trying to do?
i
I would like to skip nodes of a pipeline, or a subpipeline partially, if a certain intermediate result is already saved in the same version. I have found different approaches, but it feels like I am missing a core feature of Kedro.
m
How do you check for the existence right now?
i
load parameters and catalog -> pathlib check if filepath exists -> run part of pipeline or run it complete
Copy code
from kedro.config import OmegaConfigLoader
from kedro.pipeline import Pipeline, pipeline

def data_exists(project_name: str = "fresh", search_var: str = "experiment") -> bool:
    conf_path = str(Path("../../conf/base").absolute())
    conf_loader = OmegaConfigLoader(conf_source=conf_path)
    experiment = conf_loader["parameters"][search_var]
    for ds_name, ds in conf_loader["catalog"].items():
        if search_var in ds_name and project_name in ds_name:
            filepath = Path(ds["filepath"].replace("{experiment}", experiment))
            return filepath.exists()


def create_pipeline(**kwargs: dict) -> Pipeline:
    data_pipe = pipeline([...])
    feature_pipe = pipeline([...])
    if data_exists():
        return feature_pipe
    return data_pipe + feature_pipe
m
🙈
I suggest learning and understanding Kedro concepts first before brute-force writing the pipeline

https://www.youtube.com/watch?v=DD7JuYKp6BA&list=PL-JJgymPjK5LddZXbIzp9LWurkLGgB-nY&pp=iAQBâ–¾

i
I have already seen some of them, i will have look again.
Ah, i understand. Thanks ;)
Found my solutions here: https://getindata.com/blog/kedro-dynamic-pipelines/ 😅