Hi, I'm testing after upgrading to 0.19.9 and I fo...
# questions
a
Hi, I'm testing after upgrading to 0.19.9 and I found what seems like a bug - after running the pipeline for the second time with a runner (like during test cases) the output is no longer saved (in catalog or returned as a value from pipeline). That wasn't the case in 0.19.8
👀 3
n
Could you share an example? This sounds suspicious, which runner are you using? I only recall a minor change with ThreadRunner
a
trying to reproduce with spaceflights
Ok
I'll make a github ticket
👍🏼 1
@Kacper Leśniara
k
This unfortunately results in making the packaged model servers with kedro-mlflow work only once, then they need a reboot. FYI @Yolan Honoré-Rougé
👍 1
a
when will we start making things that actually can last? one use cutlery, one use batteries, now we get a one use servers >...<
🤣 3
m
Is this caused by the catalog work? @Elena Khaustova
a
it seems like
👀 1
e
I was able to reproduce it, looking into it
🙌 3
Here is the explanation and fix: https://github.com/kedro-org/kedro/pull/4236
👀 1
🙌 1
n
I leave a comment there. It's unclear to me why it breaks (?) I haven't been able to reproduce the error yet. I got a and b both
{}
when I run this on GitPod on 0.19.8 and 0.19.9
Is this how your test look like?
Copy code
def test_data_science_pipeline(caplog, dummy_data, dummy_parameters):

    pipeline = (
        create_ds_pipeline()
        .from_nodes("split_data_node")
        .to_nodes("evaluate_model_node")
    )
    catalog = DataCatalog()
    catalog.add_feed_dict(
        {
            "model_input_table" : dummy_data,
            "params:model_options": dummy_parameters["model_options"],
        }
    )
    
    a = SequentialRunner().run(pipeline, catalog)
    b = SequentialRunner().run(pipeline, catalog)
    assert a == b
e
@Nok Lam Chan change the test and you’ll reproduce
Copy code
pipeline = (
        create_ds_pipeline()
        .from_nodes("split_data_node")
        .to_nodes("train_model_node")
    )
evaluate_model_node
does not return anything
and there are no
free_outputs
n
ya ok, as the issue describe using the test we have in the starter and I cannot reproduce it.
let me try
@Elena Khaustova I updated the comment there with the new test, I still think there is an issue with the memory dataset definition
Copy code
pipeline.outputs()={'y_test', 'X_test', 'regressor'}
registered_ds=['params:model_options', 'model_input_table']
memory_datasets={'model_input_table', 'params:model_options'}
free_outputs={'y_test', 'X_test', 'regressor'}


pipeline.outputs()={'y_test', 'X_test', 'regressor'}
registered_ds=['X_test', 'params:model_options', 'model_input_table', 'X_train', 'regressor', 'y_test', 'y_train']
memory_datasets={'model_input_table', 'params:model_options'}
free_outputs=set()
I can see now the 2nd run we return nothing for
free_outputs
, but I expect
y_test', 'X_test', 'regressor'
in the memory_dataset, but it's not. That is why the free_output is missing them at the end.
I think the issue is with the shallow copy instead. Those free_outputs are initialised before the copy was made, and thus making incorrect reference.
I don't understand the need of the shallow copy - but by shifting all those free_outputs declaration after the shallow copy, I get the expected output correctly.
e
but by shifting all those free_outputs declaration after the shallow copy, I get the expected output correctly
can you please give an example of what you mean? moving shallow copy should not change it and in the new catalog this method will be removed anyway
y
> This unfortunately results in making the packaged model servers with kedro-mlflow work only once, then they need a reboot. FYI @Yolan Honoré-Rougé > This is due to kedro-mlflow packaging objects as a pickle. When an object is loaded, its structure is defined with the class structure
(DataCatalog)
at saving time, but it runs with the one in your environment at loading time. If there is a mismatch, the object does load , or behave like the class is defined aliasing time (e.g. here with the behaviour of the last version of kedro). Specifically here, once the bug is fixed you can just upgrade your kedro version and it should resume working normally (no need to retrain the whole model. More generally, this issue on catalog serialisation should help kedro-mlflow model be more stable over time and not break between kedro-version (e.g. just because a private internal attribute changes)
🙌 1