https://kedro.org/ logo
#questions
Title
# questions
j

Julian Nowak

03/15/2024, 9:16 AM
Hi! I'm wondering about the way to overcome
Dataseterror saving 'none' to a 'dataset' is not allowed
error. I have a dynamic pipelines for certain classifiers (e.g., seperate for XGBoost, CNN etc.) On training, I define scaler, that should be saved to dataset as a pickle object, so I can use the same scaler on validation etc. When using XGBoost, I want to define Scaler as
None
, but I got the
Dataseterror saving 'none' to a 'dataset' is not allowed
. Is there a smart way to save it anyway?
m

Merel

03/15/2024, 2:06 PM
Hi Julian, good question! Which dataset are you using? You might need to create a custom dataset on top of the one you're using.
I'm not entirely sure you can overwrite the behaviour though as it's part of the core AbstractDataset..
n

Nok Lam Chan

03/15/2024, 4:55 PM
Is there a way to avoid loading when use XGBoost instead?
j

Julian Nowak

03/19/2024, 3:14 PM
I was simply using
pickle.PickleDataset,
as a way to overcome this for now I perform:
Copy code
scaler = 0  # TODO change to None

if architecture != "xgboost":
    X, scaler = _scale(X)
My node looks like:
Copy code
node(
                func=split_data,
                inputs=[
                    "model_input_data",
                    "params:architecture",
                    "params:model_options",
                ],
                outputs=[
                    "X_train",
                    "X_test",
                    "y_train",
                    "y_test",
                    "sample_weights",
                    "scaler",
                ],
                name="split_data_node",
                tags="split",
            ),
So I'm not sure if I can avoid passing it at all, without giving up modularity of the pipeline
but I think saving it as
None
would , but maybe saving it as 0 is simpler than creating a custom dataset. btw pickle normally accepts
None
to be saved
2 Views