Hi is there a code equivalent of the CLI `kedro catalog reso Kedro #questions

Hi, is there a code equivalent of the CLI `kedro c...

Carlos Bermejo

08/07/2024, 9:50 PM

Hi, is there a code equivalent of the CLI

kedro catalog resolve > output_file.yaml

? I want to incorporate it in a function. Thanks!

👍🏼 1

Ravi Kumar Pilla

08/07/2024, 10:01 PM

Hi Carlos, if you have the DataCatalog and pipeline information, you could try this - https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/data_access/managers.py#L72-L90 Kedro-Viz uses this to resolve factory patterns in the code

👍 1

Carlos Bermejo

08/07/2024, 10:07 PM

Great! Really appreciate it. I see that's the strategy that Kedro-Viz uses to resolve the catalog. Nice!

👍 1

Nok Lam Chan

08/08/2024, 10:53 AM

@Carlos Bermejo Curious what's the use case you have in mind, this has been asked before so if it's useful enough we may end up building it ourselves. I also have a project started a while ago which may be what you want. https://github.com/noklam/kedro-inspect/blob/main/src/kedro_inspect/core.py

Carlos Bermejo

08/08/2024, 6:12 PM

We want business SMEs to test the business logic. Each tester has 2-3 pipelines to test in a very large project. We provided them with a testing Jupyter notebook where they specify the pipeline they want to test and I wanted to return for them in the next cell what are they inputs they should be playing with to test that pipeline and what are the outputs for it so that they can go and check those Excel files locally. For that I mapped the pipeline.inputs and pipeline.outputs and passed those through a dictionary of file_name:file_path. In order to build this file path mapping dictionary I iterated the file_names using the resolved catalog:

Copy code

filepaths = [
    catalog._get_dataset(dataset_name)._describe()["filepath"]
    for dataset_name in datasets
]

catalog_filepath_dict = dict(zip(datasets, filepaths))

This gives me the Posix path. Now the clients can now where to look at when running their tests.

Carlos Bermejo

08/08/2024, 6:14 PM

This is the function that gets called in the Jupyter:

Copy code

def print_paths(catalog_filepath_dict, file_type: str = "inputs"):
    print(f"{len(catalog_filepath_dict)} {file_type.capitalize()} for this pipeline ")
    print("")
    for file_name, file_path in catalog_filepath_dict.items():
        display(file_name)
        display(file_path)
        print()

It gets run for "inputs" and "outputs"

Nok Lam Chan

08/09/2024, 9:49 PM

So basically you need the path, have you considered using catalog.load directly? Or is it more like someone going to open an Excel file directly to inspect?

Carlos Bermejo

08/10/2024, 12:41 AM

The latter. They are going to open and inspect the Excel files. They are business users, completely foreign to Jupyter or Data Science

👍🏼 1

2 Views

Open in Slack

Previous Next