HI I am currently migrating kedro from 0 17 2 to 0 18 1 In c Kedro #questions

HI! I am currently migrating kedro from 0.17.2 to ...

Rassul Yermagambet

01/05/2024, 11:01 AM

HI! I am currently migrating kedro from 0.17.2 to 0.18.1. In current version (0.17.2) catalog and config loader are registered in hooks.py via 'register_config_loader' and 'register_catalog' methods respectively. Register config method returns TemplatedConfigLoader class while register catalog returns DataCatalog class. But hooks implementation was removed in kedro 0.18.*. Could anyone please suggest how these methods can be implemented in kedro 0.18.1?

Copy code

@hook_impl
def register_config_loader(
        self,
        conf_paths: Iterable[str],
        env: str,
        extra_params: Dict[str, Any],
) -> TemplatedConfigLoader:
    return TemplatedConfigLoader(conf_paths, globals_pattern="*globals.yml")

@hook_impl    
def register_catalog(
            self,
            catalog: Optional[Dict[str, Dict[str, Any]]],
            credentials: Dict[str, Dict[str, Any]],
            load_versions: Dict[str, str],
            save_version: str,
            # journal: Journal,
    ) -> DataCatalog:
        return DataCatalog.from_config(catalog, credentials, load_versions, save_version, 
                                    #    journal
                                    )

Merel

01/05/2024, 11:06 AM

Hi @Rassul Yermagambet, you should now use

settings.py

instead to set a custom config loader and catalog. See the migration guide https://github.com/kedro-org/kedro/blob/main/RELEASE.md#breaking-changes-to-the-api-7 as well as: https://docs.kedro.org/en/stable/kedro_project_setup/settings.html

Rassul Yermagambet

01/05/2024, 11:47 AM

Thank you! I was doing as stated in docs, but was getting error. Able to resolve it .

👍 1

Rassul Yermagambet

01/08/2024, 8:47 AM

@Merel Unfortunately stuck at another error 😅. Could you please help to resolve it:

ValueError: Failed to format pattern '${folders.ref}': no config value found, no default provided

Filepaths with formats are defined in globals.yml as following:

Copy code

# file /conf/base/globals
base_dir: ""
folders:
    ref: "data/reference"
    raw: "data/base/01_raw"
    intermediate: "data/base/02_intermediate"
    primary: "data/base/03_primary"
    features: "data/base/04_feature"
    model_input: "data/base/05_model_input"
    models: "data/base/06_models"
    model_output: "data/base/07_model_output"
    reporting: "data/base/08_reporting"

# timezone set for the whole pipeline
pipeline_timezone: 'UTC'

I defined the config files in settings.py as following:

Copy code

CONFIG_LOADER_CLASS = TemplatedConfigLoader(
    # conf_source=str(Path(__file__).parents[2].resolve() / settings.CONF_SOURCE),
    conf_source = "conf",
    base_env="base",
    default_run_env="local",
)

CONFIG_LOADER_ARGS = {
                    "globals_pattern":"*globals.yml",
                    "config_pattern": {
                        "catalog": ["catalog*", "catalog*/**", "**/catalog*"],
                        "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
                        "credentials": ["credentials*", "credentials*/**", "**/credentials*"],
                        }
}

CONF_CATALOG = CONFIG_LOADER_CLASS["catalog"]
CONF_CREDENTIALS = CONFIG_LOADER_CLASS["credentials"]
DATA_CATALOG_CLASS = DataCatalog.from_config(
    catalog=CONF_CATALOG, credentials=CONF_CREDENTIALS
)

Merel

01/08/2024, 10:51 AM

As far as I can tell, you are using the default

TemplatedConfigLoader

settings, so you can just do:

Copy code

from kedro.config import TemplatedConfigLoader  # new import

CONFIG_LOADER_CLASS = TemplatedConfigLoader

You also don't need to overwrite the patterns like you have here in

CONFIG_LOADER_ARGS

Merel

01/08/2024, 10:51 AM

Just to double check though, you're using Kedro

0.18.1

right?

Merel

01/08/2024, 10:53 AM

What you're trying to do here:

Copy code

CONF_CATALOG = CONFIG_LOADER_CLASS["catalog"]
CONF_CREDENTIALS = CONFIG_LOADER_CLASS["credentials"]

isn't possible until

0.18.4

, but I also don't think this is necessary because you're using the default

DataCatalog

Rassul Yermagambet

01/08/2024, 11:17 AM

I compared the changes introduced from Kedro version 0.18.1 to 0.18.6 and concluded that it would be convenient to upgrade directly to Kedro 0.18.6. This decision was made in order to easily overcome some errors and implement the data catalog-related structure. As of now, I am using Kedro version 0.18.6. I tried using the default classes as you suggested

Copy code

CONFIG_LOADER_CLASS = TemplatedConfigLoader
DATA_CATALOG_CLASS = DataCatalog

But I am getting the following error:

ValueError: Failed to format pattern '${telegram_logger.token}': no config value found, no default provided

The logger is defined in logging.yml. I tried several modifications with class parameters, but all raised errors are related to the value error above.

Merel

01/08/2024, 11:21 AM

In what file are you referencing

${telegram_logger.token}

and is the value for

telegram_logger.token

within your globals?

Rassul Yermagambet

01/08/2024, 11:38 AM

I am referencing it in

logging.yml

under

conf>base

. It is not within

globals

. For context: it is used for the Telegram bot API to send log messages. I removed it for a while to further advance the work as it is not so important, but it would be a pleasure if you could help resolve the error. And got the following error after removing the talegram handler from

logging.yml

ModularPipelineError: Failed to map datasets and/or parameters: params:train_model, params:train_model.report

Copy code

prediction_pipeline = Pipeline(
        [
                ...... 
                parameters={
                    'params:train_model': 'params:train_power_model',
                    'params:train_model.report': 'params:train_power_model.report',
                ......
        ])

train_model

is the custom modular pipeline. Also it is used as following in `parameter.yml`:

Copy code

train_model_power:
    type: pickle.PickleDataSet
    filepath: ${folders.models}/model_power_21_02_23.pickle
    layer: train_model

But I am not understanding why and how it is passed as parameters

Merel

01/08/2024, 1:58 PM

This error

ModularPipelineError: Failed to map datasets and/or parameters: params:train_model, params:train_model.report

is basically telling you that the parameters

train_model

and

train_model.report

can't be found. Do you have parameters with those names?

Rassul Yermagambet

01/09/2024, 3:52 AM

This is actually the problem from which I started this thread 😅. There is no

train_model

in the parameters. The project only has modular pipeline named

train_model

. I did not understand how it was implemented in Kedro 0.17.2, but it worked that way. That's why I thought the issue might be related to the correct way of registering catalog files.

Merel

01/09/2024, 1:43 PM

But in this snippet you posted:

Copy code

prediction_pipeline = Pipeline(
        [
                ...... 
                parameters={
                    'params:train_model': 'params:train_power_model',
                    'params:train_model.report': 'params:train_power_model.report',
                ......
        ])

you reference

params: train_model

. So what are you trying to do there?

Rassul Yermagambet

01/11/2024, 4:27 AM

By far as I understood the pipeline provided here is constructed using a modular pipeline named 'train_model.' However, I'm currently struggling myself to comprehend how inputs and parameters are being passed without any explicit prior declaration. The only place these declarations seem to exist is in an example notebook left by the project developers, which showcases the usage of the 'train_model' module. Due to this, I'm uncertain whether Kedro offers any mechanisms to achieve this without explicit declarations

Copy code

pipeline(
                pipe=train_model.create_pipeline().only_nodes('train_model.load_regressor',
                                                              'train_model.add_transformers',
                                                              'train_model.train_model',
                                                              'train_model.create_train_predictions',
                                                              'train_model.create_test_predictions',
                                                              'train_model.generate_performance_report'
                                                              ),
                inputs={
                    'train_model.train_set': 'data_master_train',
                    'train_model.test_set': 'data_master_test',
                    'train_model.input': 'data_master',
                    'train_model.td': 'tag_dictionary',
                },
                parameters={
                    'params:train_model': 'params:train_power_model',
                    'params:train_model.report': 'params:train_power_model.report',
                },
                outputs={
                    'train_model.train_set_model': 'train_model_power',
                    'train_model.train_set_feature_importance': 'power_model_feature_importance',
                    'train_model.train_set_predictions': 'train_set_power_model_predictions',
                    'train_model.train_set_metrics': 'train_set_power_model_metrics',
                    'train_model.test_set_predictions': 'test_set_power_model_predictions',
                    'train_model.test_set_metrics': 'test_set_power_model_metrics',
                },
                namespace='train_power_model'
            ),

Merel

01/12/2024, 4:36 PM

So usually your inputs/outputs should be declared in

catalog.yml

and parameters in

parameters.yml

and Kedro then loads and saves them for you when running the pipeline.

Rassul Yermagambet

01/15/2024, 5:20 AM

Thank you! Is it ok to if I continue this thread in case of further issues?

Merel

01/15/2024, 9:23 AM

of course 🙂

5 Views

Open in Slack

Previous Next