HI! I am currently migrating kedro from 0.17.2 to ...
# questions
r
HI! I am currently migrating kedro from 0.17.2 to 0.18.1. In current version (0.17.2) catalog and config loader are registered in hooks.py via 'register_config_loader' and 'register_catalog' methods respectively. Register config method returns TemplatedConfigLoader class while register catalog returns DataCatalog class. But hooks implementation was removed in kedro 0.18.*. Could anyone please suggest how these methods can be implemented in kedro 0.18.1?
Copy code
@hook_impl
def register_config_loader(
        self,
        conf_paths: Iterable[str],
        env: str,
        extra_params: Dict[str, Any],
) -> TemplatedConfigLoader:
    return TemplatedConfigLoader(conf_paths, globals_pattern="*globals.yml")

@hook_impl    
def register_catalog(
            self,
            catalog: Optional[Dict[str, Dict[str, Any]]],
            credentials: Dict[str, Dict[str, Any]],
            load_versions: Dict[str, str],
            save_version: str,
            # journal: Journal,
    ) -> DataCatalog:
        return DataCatalog.from_config(catalog, credentials, load_versions, save_version, 
                                    #    journal
                                    )
m
Hi @Rassul Yermagambet, you should now use
settings.py
instead to set a custom config loader and catalog. See the migration guide https://github.com/kedro-org/kedro/blob/main/RELEASE.md#breaking-changes-to-the-api-7 as well as: https://docs.kedro.org/en/stable/kedro_project_setup/settings.html
r
Thank you! I was doing as stated in docs, but was getting error. Able to resolve it .
👍 1
@Merel Unfortunately stuck at another error 😅. Could you please help to resolve it:
ValueError: Failed to format pattern '${folders.ref}': no config value found, no default provided
Filepaths with formats are defined in globals.yml as following:
Copy code
# file /conf/base/globals
base_dir: ""
folders:
    ref: "data/reference"
    raw: "data/base/01_raw"
    intermediate: "data/base/02_intermediate"
    primary: "data/base/03_primary"
    features: "data/base/04_feature"
    model_input: "data/base/05_model_input"
    models: "data/base/06_models"
    model_output: "data/base/07_model_output"
    reporting: "data/base/08_reporting"

# timezone set for the whole pipeline
pipeline_timezone: 'UTC'
I defined the config files in settings.py as following:
Copy code
CONFIG_LOADER_CLASS = TemplatedConfigLoader(
    # conf_source=str(Path(__file__).parents[2].resolve() / settings.CONF_SOURCE),
    conf_source = "conf",
    base_env="base",
    default_run_env="local",
)

CONFIG_LOADER_ARGS = {
                    "globals_pattern":"*globals.yml",
                    "config_pattern": {
                        "catalog": ["catalog*", "catalog*/**", "**/catalog*"],
                        "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
                        "credentials": ["credentials*", "credentials*/**", "**/credentials*"],
                        }
}

CONF_CATALOG = CONFIG_LOADER_CLASS["catalog"]
CONF_CREDENTIALS = CONFIG_LOADER_CLASS["credentials"]
DATA_CATALOG_CLASS = DataCatalog.from_config(
    catalog=CONF_CATALOG, credentials=CONF_CREDENTIALS
)
m
As far as I can tell, you are using the default
TemplatedConfigLoader
settings, so you can just do:
Copy code
from kedro.config import TemplatedConfigLoader  # new import

CONFIG_LOADER_CLASS = TemplatedConfigLoader
You also don't need to overwrite the patterns like you have here in
CONFIG_LOADER_ARGS
Just to double check though, you're using Kedro
0.18.1
right?
What you're trying to do here:
Copy code
CONF_CATALOG = CONFIG_LOADER_CLASS["catalog"]
CONF_CREDENTIALS = CONFIG_LOADER_CLASS["credentials"]
isn't possible until
0.18.4
, but I also don't think this is necessary because you're using the default
DataCatalog
r
I compared the changes introduced from Kedro version 0.18.1 to 0.18.6 and concluded that it would be convenient to upgrade directly to Kedro 0.18.6. This decision was made in order to easily overcome some errors and implement the data catalog-related structure. As of now, I am using Kedro version 0.18.6. I tried using the default classes as you suggested
Copy code
CONFIG_LOADER_CLASS = TemplatedConfigLoader
DATA_CATALOG_CLASS = DataCatalog
But I am getting the following error:
ValueError: Failed to format pattern '${telegram_logger.token}': no config value found, no default provided
The logger is defined in logging.yml. I tried several modifications with class parameters, but all raised errors are related to the value error above.
m
In what file are you referencing
${telegram_logger.token}
and is the value for
telegram_logger.token
within your globals?
r
I am referencing it in
logging.yml
under
conf>base
. It is not within
globals
. For context: it is used for the Telegram bot API to send log messages. I removed it for a while to further advance the work as it is not so important, but it would be a pleasure if you could help resolve the error. And got the following error after removing the talegram handler from
logging.yml
:
ModularPipelineError: Failed to map datasets and/or parameters: params:train_model, params:train_model.report
Copy code
prediction_pipeline = Pipeline(
        [
                ...... 
                parameters={
                    'params:train_model': 'params:train_power_model',
                    'params:train_model.report': 'params:train_power_model.report',
                ......
        ])
train_model
is the custom modular pipeline. Also it is used as following in `parameter.yml`:
Copy code
train_model_power:
    type: pickle.PickleDataSet
    filepath: ${folders.models}/model_power_21_02_23.pickle
    layer: train_model
But I am not understanding why and how it is passed as parameters
m
This error
ModularPipelineError: Failed to map datasets and/or parameters: params:train_model, params:train_model.report
is basically telling you that the parameters
train_model
and
train_model.report
can't be found. Do you have parameters with those names?
r
This is actually the problem from which I started this thread 😅. There is no
train_model
in the parameters. The project only has modular pipeline named
train_model
. I did not understand how it was implemented in Kedro 0.17.2, but it worked that way. That's why I thought the issue might be related to the correct way of registering catalog files.
m
But in this snippet you posted:
Copy code
prediction_pipeline = Pipeline(
        [
                ...... 
                parameters={
                    'params:train_model': 'params:train_power_model',
                    'params:train_model.report': 'params:train_power_model.report',
                ......
        ])
you reference
params: train_model
. So what are you trying to do there?
r
By far as I understood the pipeline provided here is constructed using a modular pipeline named 'train_model.' However, I'm currently struggling myself to comprehend how inputs and parameters are being passed without any explicit prior declaration. The only place these declarations seem to exist is in an example notebook left by the project developers, which showcases the usage of the 'train_model' module. Due to this, I'm uncertain whether Kedro offers any mechanisms to achieve this without explicit declarations
Copy code
pipeline(
                pipe=train_model.create_pipeline().only_nodes('train_model.load_regressor',
                                                              'train_model.add_transformers',
                                                              'train_model.train_model',
                                                              'train_model.create_train_predictions',
                                                              'train_model.create_test_predictions',
                                                              'train_model.generate_performance_report'
                                                              ),
                inputs={
                    'train_model.train_set': 'data_master_train',
                    'train_model.test_set': 'data_master_test',
                    'train_model.input': 'data_master',
                    'train_model.td': 'tag_dictionary',
                },
                parameters={
                    'params:train_model': 'params:train_power_model',
                    'params:train_model.report': 'params:train_power_model.report',
                },
                outputs={
                    'train_model.train_set_model': 'train_model_power',
                    'train_model.train_set_feature_importance': 'power_model_feature_importance',
                    'train_model.train_set_predictions': 'train_set_power_model_predictions',
                    'train_model.train_set_metrics': 'train_set_power_model_metrics',
                    'train_model.test_set_predictions': 'test_set_power_model_predictions',
                    'train_model.test_set_metrics': 'test_set_power_model_metrics',
                },
                namespace='train_power_model'
            ),
m
So usually your inputs/outputs should be declared in
catalog.yml
and parameters in
parameters.yml
and Kedro then loads and saves them for you when running the pipeline.
r
Thank you! Is it ok to if I continue this thread in case of further issues?
m
of course 🙂