Hi I m wondering about the way to overcome `Dataseterror sav Kedro #questions

Hi! I'm wondering about the way to overcome `Datas...

Julian Nowak

03/15/2024, 9:16 AM

Hi! I'm wondering about the way to overcome

Dataseterror saving 'none' to a 'dataset' is not allowed

error. I have a dynamic pipelines for certain classifiers (e.g., seperate for XGBoost, CNN etc.) On training, I define scaler, that should be saved to dataset as a pickle object, so I can use the same scaler on validation etc. When using XGBoost, I want to define Scaler as

None

, but I got the

Dataseterror saving 'none' to a 'dataset' is not allowed

. Is there a smart way to save it anyway?

Merel

03/15/2024, 2:06 PM

Hi Julian, good question! Which dataset are you using? You might need to create a custom dataset on top of the one you're using.

Merel

03/15/2024, 2:09 PM

I'm not entirely sure you can overwrite the behaviour though as it's part of the core AbstractDataset..

Nok Lam Chan

03/15/2024, 4:55 PM

Is there a way to avoid loading when use XGBoost instead?

Julian Nowak

03/19/2024, 3:14 PM

I was simply using

pickle.PickleDataset,

as a way to overcome this for now I perform:

Copy code

scaler = 0  # TODO change to None

if architecture != "xgboost":
    X, scaler = _scale(X)

Julian Nowak

03/19/2024, 3:14 PM

My node looks like:

Copy code

node(
                func=split_data,
                inputs=[
                    "model_input_data",
                    "params:architecture",
                    "params:model_options",
                ],
                outputs=[
                    "X_train",
                    "X_test",
                    "y_train",
                    "y_test",
                    "sample_weights",
                    "scaler",
                ],
                name="split_data_node",
                tags="split",
            ),

Julian Nowak

03/19/2024, 3:16 PM

So I'm not sure if I can avoid passing it at all, without giving up modularity of the pipeline

Julian Nowak

03/19/2024, 3:17 PM

but I think saving it as

None

would , but maybe saving it as 0 is simpler than creating a custom dataset. btw pickle normally accepts

None

to be saved

77 Views

Open in Slack

Previous Next