Artur Dobrogowski
10/17/2024, 9:21 AMNok Lam Chan
10/17/2024, 9:27 AMArtur Dobrogowski
10/17/2024, 9:38 AMArtur Dobrogowski
10/17/2024, 9:41 AMArtur Dobrogowski
10/17/2024, 9:42 AMArtur Dobrogowski
10/17/2024, 9:45 AMArtur Dobrogowski
10/17/2024, 9:47 AMKacper Leśniara
10/17/2024, 9:51 AMArtur Dobrogowski
10/17/2024, 9:53 AMMerel
10/17/2024, 10:03 AMArtur Dobrogowski
10/17/2024, 10:05 AMElena Khaustova
10/17/2024, 10:48 AMElena Khaustova
10/17/2024, 1:45 PMNok Lam Chan
10/17/2024, 2:05 PM{}
when I run this on GitPod on 0.19.8 and 0.19.9Nok Lam Chan
10/17/2024, 2:05 PMdef test_data_science_pipeline(caplog, dummy_data, dummy_parameters):
pipeline = (
create_ds_pipeline()
.from_nodes("split_data_node")
.to_nodes("evaluate_model_node")
)
catalog = DataCatalog()
catalog.add_feed_dict(
{
"model_input_table" : dummy_data,
"params:model_options": dummy_parameters["model_options"],
}
)
a = SequentialRunner().run(pipeline, catalog)
b = SequentialRunner().run(pipeline, catalog)
assert a == b
Elena Khaustova
10/17/2024, 2:07 PMpipeline = (
create_ds_pipeline()
.from_nodes("split_data_node")
.to_nodes("train_model_node")
)
Elena Khaustova
10/17/2024, 2:09 PMevaluate_model_node
does not return anythingElena Khaustova
10/17/2024, 2:10 PMfree_outputs
Nok Lam Chan
10/17/2024, 2:16 PMNok Lam Chan
10/17/2024, 2:16 PMNok Lam Chan
10/17/2024, 2:21 PMNok Lam Chan
10/17/2024, 2:22 PMpipeline.outputs()={'y_test', 'X_test', 'regressor'}
registered_ds=['params:model_options', 'model_input_table']
memory_datasets={'model_input_table', 'params:model_options'}
free_outputs={'y_test', 'X_test', 'regressor'}
pipeline.outputs()={'y_test', 'X_test', 'regressor'}
registered_ds=['X_test', 'params:model_options', 'model_input_table', 'X_train', 'regressor', 'y_test', 'y_train']
memory_datasets={'model_input_table', 'params:model_options'}
free_outputs=set()
I can see now the 2nd run we return nothing for free_outputs
, but I expect y_test', 'X_test', 'regressor'
in the memory_dataset, but it's not. That is why the free_output is missing them at the end.Nok Lam Chan
10/17/2024, 2:47 PMNok Lam Chan
10/17/2024, 2:50 PMElena Khaustova
10/17/2024, 3:26 PMbut by shifting all those free_outputs declaration after the shallow copy, I get the expected output correctlycan you please give an example of what you mean? moving shallow copy should not change it and in the new catalog this method will be removed anyway
Yolan Honoré-Rougé
10/18/2024, 6:19 AM(DataCatalog)
at saving time, but it runs with the one in your environment at loading time. If there is a mismatch, the object does load , or behave like the class is defined aliasing time (e.g. here with the behaviour of the last version of kedro).
Specifically here, once the bug is fixed you can just upgrade your kedro version and it should resume working normally (no need to retrain the whole model. More generally, this issue on catalog serialisation should help kedro-mlflow model be more stable over time and not break between kedro-version (e.g. just because a private internal attribute changes)