Yong Bang Xiang
09/05/2023, 8:50 AMdef node_ML_model(model_type, model_parameter_pca, model_parameter_rf):
if model_type == "PCA":
return PCA(model_parameter_pca)
if model_type == "Randomforest":
return Randomforest(model_parameter_rf)
and the .yml is straightforward:
model_type : "PCA" ## or "Randomforest"
model_parameter_pca : 10
model_parameter_rf : 20
but this doesnt feel quite right, i would like your opinions on how to best structure it.
e.g. should i separate the node_ml_model into multiple nodes (in this example 2 nodes: node_pca and node_rf) instead of one and have a parameter in .yml somehow control which node is selected in the pipeline?Juan Luis
09/05/2023, 9:07 AMmodel_parameter_pca
and model_parameter_rf
separately, have you tried having a single model_parameters
? something like
model_type: "PCA" # or RF
model_parameters:
pca: 10
# rf: 20
and then
return PCA(model_parameters["pca"])
(haven't tested this code, let me know if it helps)Yong Bang Xiang
09/05/2023, 9:25 AMLodewic van Twillert
09/05/2023, 1:11 PMkedro.utils.load_obj
to load class directly from a string parameter. Similarly how you define the class in your datacatalog entries, e.g. type: pandas.CSVDataSet
, and Kedro is able to load the classes from that:)
Also since your model kwargs are tied to which model you are using, I would keep the class + kwargs close together.
params
, example with some sklearn models
models:
nearest_neighbors:
class: sklearn.neighbors.KNeighborsClassifier
model_kwargs:
n_neighbors: 3
linear_svm:
class: sklearn.svm.SVC
model_kwargs:
kernel: linear
C: 0.025
decision_tree:
class: sklearn.tree.DecisionTreeClassifier
model_kwargs:
max_depth: 3
node
function
import logging
from typing import Any, Dict, Tuple
import pandas as pd
from kedro.utils import load_obj
def create_model(model_type: str, model_kwargs: dict[str, Any] = {}):
"""Loads model class from `model_type` and fits model to X and y data."""
model_class = load_obj(model_type)
model_obj = model_class(**model_kwargs)
return model_obj
Example in a pipeline
pipeline([
node(func=create_model, inputs=dict(model_type="decision_tree.class", model_kwargs="decision_tree.model_kwargs"), output="model_obj")
])
-- I used this approach in an example with dynamic model pipelines here: https://github.com/Lodewic/kedro-dynamic-pipeline-hook-example/tree/mainYong Bang Xiang
09/05/2023, 3:11 PMkedro.utils.load_obj
before and that we could use it this way. thanks a lot!Yolan Honoré-Rougé
09/05/2023, 8:57 PMkedro==0.18.13
is to use ``OmegaConfigLoader`` and a custom resolver to basically to what your create_model
do. This helps loading python objects directly from the conf instead of having intermediate nodes to do that.