Hi all, i have a question on best practice for tog...
# questions
y
Hi all, i have a question on best practice for toggling nodes using parameters. in particular, i would like a node to return an ML model of choice specified by parameters specified in .yml: a crude example of how it would be:
Copy code
def node_ML_model(model_type, model_parameter_pca, model_parameter_rf):
      if model_type == "PCA":
          return PCA(model_parameter_pca)
      if model_type == "Randomforest":
          return Randomforest(model_parameter_rf)
and the .yml is straightforward:
Copy code
model_type : "PCA" ## or "Randomforest"
model_parameter_pca : 10
model_parameter_rf : 20
but this doesnt feel quite right, i would like your opinions on how to best structure it. e.g. should i separate the node_ml_model into multiple nodes (in this example 2 nodes: node_pca and node_rf) instead of one and have a parameter in .yml somehow control which node is selected in the pipeline?
1
j
hi @Yong Bang Xiang! • about selecting the model, currently the conditional is your best bet, since Kedro doesn't yet have a high level concept of "conditional node selection". • instead of having
model_parameter_pca
and
model_parameter_rf
separately, have you tried having a single
model_parameters
? something like
Copy code
model_type: "PCA"  # or RF
model_parameters:
  pca: 10
  # rf: 20
and then
Copy code
return PCA(model_parameters["pca"])
(haven't tested this code, let me know if it helps)
y
great this totally helps, thanks!
🙌🏼 1
l
@Yong Bang Xiang Here's an alternative where you don't need to define the classes in your function. You can use
kedro.utils.load_obj
to load class directly from a string parameter. Similarly how you define the class in your datacatalog entries, e.g.
type: pandas.CSVDataSet
, and Kedro is able to load the classes from that:) Also since your model kwargs are tied to which model you are using, I would keep the class + kwargs close together.
params
, example with some sklearn models
Copy code
models:
  nearest_neighbors:
    class: sklearn.neighbors.KNeighborsClassifier
    model_kwargs:
      n_neighbors: 3
  linear_svm:
    class: sklearn.svm.SVC
    model_kwargs:
      kernel: linear
      C: 0.025
  decision_tree:
    class: sklearn.tree.DecisionTreeClassifier
    model_kwargs:
      max_depth: 3
node
function
Copy code
import logging
from typing import Any, Dict, Tuple

import pandas as pd
from kedro.utils import load_obj

def create_model(model_type: str, model_kwargs: dict[str, Any] = {}):
    """Loads model class from `model_type` and fits model to X and y data."""
    model_class = load_obj(model_type)
    model_obj = model_class(**model_kwargs)
    return model_obj
Example in a pipeline
Copy code
pipeline([
    node(func=create_model, inputs=dict(model_type="decision_tree.class", model_kwargs="decision_tree.model_kwargs"), output="model_obj")
])
-- I used this approach in an example with dynamic model pipelines here: https://github.com/Lodewic/kedro-dynamic-pipeline-hook-example/tree/main
y
this is fantastic @Lodewic van Twillert , didn't know about
kedro.utils.load_obj
before and that we could use it this way. thanks a lot!
🥳 1
y
This is a great hack @Lodewic van Twillert. I think the more "kedronic" in
kedro==0.18.13
is to use ``OmegaConfigLoader`` and a custom resolver to basically to what your
create_model
do. This helps loading python objects directly from the conf instead of having intermediate nodes to do that.