Hi, is there a code equivalent of the CLI `kedro c...
# questions
c
Hi, is there a code equivalent of the CLI
kedro catalog resolve > output_file.yaml
? I want to incorporate it in a function. Thanks!
👍🏼 1
r
Hi Carlos, if you have the DataCatalog and pipeline information, you could try this - https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/data_access/managers.py#L72-L90 Kedro-Viz uses this to resolve factory patterns in the code
👍 1
c
Great! Really appreciate it. I see that's the strategy that Kedro-Viz uses to resolve the catalog. Nice!
👍 1
n
@Carlos Bermejo Curious what's the use case you have in mind, this has been asked before so if it's useful enough we may end up building it ourselves. I also have a project started a while ago which may be what you want. https://github.com/noklam/kedro-inspect/blob/main/src/kedro_inspect/core.py
c
We want business SMEs to test the business logic. Each tester has 2-3 pipelines to test in a very large project. We provided them with a testing Jupyter notebook where they specify the pipeline they want to test and I wanted to return for them in the next cell what are they inputs they should be playing with to test that pipeline and what are the outputs for it so that they can go and check those Excel files locally. For that I mapped the pipeline.inputs and pipeline.outputs and passed those through a dictionary of file_name:file_path. In order to build this file path mapping dictionary I iterated the file_names using the resolved catalog:
Copy code
filepaths = [
    catalog._get_dataset(dataset_name)._describe()["filepath"]
    for dataset_name in datasets
]

catalog_filepath_dict = dict(zip(datasets, filepaths))
This gives me the Posix path. Now the clients can now where to look at when running their tests.
This is the function that gets called in the Jupyter:
Copy code
def print_paths(catalog_filepath_dict, file_type: str = "inputs"):
    print(f"{len(catalog_filepath_dict)} {file_type.capitalize()} for this pipeline ")
    print("")
    for file_name, file_path in catalog_filepath_dict.items():
        display(file_name)
        display(file_path)
        print()
It gets run for "inputs" and "outputs"
n
So basically you need the path, have you considered using catalog.load directly? Or is it more like someone going to open an Excel file directly to inspect?
c
The latter. They are going to open and inspect the Excel files. They are business users, completely foreign to Jupyter or Data Science
👍🏼 1