Andrea Maioli
09/13/2024, 2:20 PMModel1Config:
type: yaml.YAMLDataset
filepath: src/mykedro/models/Model1/config.yaml
Model2Config:
type: yaml.YAMLDataset
filepath: src/mykedro/models/Model2/config.yaml
I would like to use these config files in a loop (for example to train the two model seuentially).
For that I need to have access to the catalog
(or even better to the current session) in the node.
My first idea was to create a dummy node to expose session as variable:
from kedro.framework.session.session import get_current_session
def load_session():
return get_current_session()
node(
func=load_session,
inputs=None,
outputs='current_session',
),
The problem is that get_current_session is deprecated and no longer available (I am using kedro 0.19)
My question is: how can I pass these values to node inputs?
Notice that creating a new KedroSession in load_session
like this
from kedro.framework.session import KedroSession
def load_session():
with KedroSession.create() as session:
return session
is not working for me (I receive a memory error).Yury Fedotov
09/13/2024, 5:25 PMdef train_model(data: pd.DataFrame, config: dict) -> BaseEstimator:
...
Where config
expects that dictionary you have in your yaml files.
Next, when defining a pipeline, you wrap this function into a node, and it's during this wrapping you point it to datasets you've defined:
# Inside this function
def create_pipeline() -> Pipeline:
return Pipeline([
...
node(
func=train_model,
inputs={
"data": "how_your_data_is_called_in_catalog",
"config": "Model1Config", # This is the name of what you have in catalog
},
outputs="trained_model_1",
)
...
])
Yury Fedotov
09/13/2024, 5:27 PMsession
object, or even know that it exists