Hi all i have a question on best practice for toggling nodes Kedro #questions

Hi all, i have a question on best practice for tog...

Yong Bang Xiang

09/05/2023, 8:50 AM

Hi all, i have a question on best practice for toggling nodes using parameters. in particular, i would like a node to return an ML model of choice specified by parameters specified in .yml: a crude example of how it would be:

Copy code

def node_ML_model(model_type, model_parameter_pca, model_parameter_rf):
      if model_type == "PCA":
          return PCA(model_parameter_pca)
      if model_type == "Randomforest":
          return Randomforest(model_parameter_rf)

and the .yml is straightforward:

Copy code

model_type : "PCA" ## or "Randomforest"
model_parameter_pca : 10
model_parameter_rf : 20

but this doesnt feel quite right, i would like your opinions on how to best structure it. e.g. should i separate the node_ml_model into multiple nodes (in this example 2 nodes: node_pca and node_rf) instead of one and have a parameter in .yml somehow control which node is selected in the pipeline?

✅ 1

Juan Luis

09/05/2023, 9:07 AM

hi @Yong Bang Xiang! • about selecting the model, currently the conditional is your best bet, since Kedro doesn't yet have a high level concept of "conditional node selection". • instead of having

model_parameter_pca

and

model_parameter_rf

separately, have you tried having a single

model_parameters

? something like

Copy code

model_type: "PCA"  # or RF
model_parameters:
  pca: 10
  # rf: 20

and then

Copy code

return PCA(model_parameters["pca"])

(haven't tested this code, let me know if it helps)

Yong Bang Xiang

09/05/2023, 9:25 AM

great this totally helps, thanks!

🙌🏼 1

Lodewic van Twillert

09/05/2023, 1:11 PM

@Yong Bang Xiang Here's an alternative where you don't need to define the classes in your function. You can use

kedro.utils.load_obj

to load class directly from a string parameter. Similarly how you define the class in your datacatalog entries, e.g.

type: pandas.CSVDataSet

, and Kedro is able to load the classes from that:) Also since your model kwargs are tied to which model you are using, I would keep the class + kwargs close together.

params

, example with some sklearn models

Copy code

models:
  nearest_neighbors:
    class: sklearn.neighbors.KNeighborsClassifier
    model_kwargs:
      n_neighbors: 3
  linear_svm:
    class: sklearn.svm.SVC
    model_kwargs:
      kernel: linear
      C: 0.025
  decision_tree:
    class: sklearn.tree.DecisionTreeClassifier
    model_kwargs:
      max_depth: 3

node

function

Copy code

import logging
from typing import Any, Dict, Tuple

import pandas as pd
from kedro.utils import load_obj

def create_model(model_type: str, model_kwargs: dict[str, Any] = {}):
    """Loads model class from `model_type` and fits model to X and y data."""
    model_class = load_obj(model_type)
    model_obj = model_class(**model_kwargs)
    return model_obj

Example in a pipeline

Copy code

pipeline([
    node(func=create_model, inputs=dict(model_type="decision_tree.class", model_kwargs="decision_tree.model_kwargs"), output="model_obj")
])

-- I used this approach in an example with dynamic model pipelines here: https://github.com/Lodewic/kedro-dynamic-pipeline-hook-example/tree/main

Yong Bang Xiang

09/05/2023, 3:11 PM

this is fantastic @Lodewic van Twillert , didn't know about

kedro.utils.load_obj

before and that we could use it this way. thanks a lot!

🥳 1

Yolan Honoré-Rougé

09/05/2023, 8:57 PM

This is a great hack @Lodewic van Twillert. I think the more "kedronic" in

kedro==0.18.13

is to use ``OmegaConfigLoader`` and a custom resolver to basically to what your

create_model

do. This helps loading python objects directly from the conf instead of having intermediate nodes to do that.

Open in Slack

Previous Next