Hi! I'm wondering about the way to overcome `Datas...
# questions
j
Hi! I'm wondering about the way to overcome
Dataseterror saving 'none' to a 'dataset' is not allowed
error. I have a dynamic pipelines for certain classifiers (e.g., seperate for XGBoost, CNN etc.) On training, I define scaler, that should be saved to dataset as a pickle object, so I can use the same scaler on validation etc. When using XGBoost, I want to define Scaler as
None
, but I got the
Dataseterror saving 'none' to a 'dataset' is not allowed
. Is there a smart way to save it anyway?
m
Hi Julian, good question! Which dataset are you using? You might need to create a custom dataset on top of the one you're using.
I'm not entirely sure you can overwrite the behaviour though as it's part of the core AbstractDataset..
n
Is there a way to avoid loading when use XGBoost instead?
j
I was simply using
pickle.PickleDataset,
as a way to overcome this for now I perform:
Copy code
scaler = 0  # TODO change to None

if architecture != "xgboost":
    X, scaler = _scale(X)
My node looks like:
Copy code
node(
                func=split_data,
                inputs=[
                    "model_input_data",
                    "params:architecture",
                    "params:model_options",
                ],
                outputs=[
                    "X_train",
                    "X_test",
                    "y_train",
                    "y_test",
                    "sample_weights",
                    "scaler",
                ],
                name="split_data_node",
                tags="split",
            ),
So I'm not sure if I can avoid passing it at all, without giving up modularity of the pipeline
but I think saving it as
None
would , but maybe saving it as 0 is simpler than creating a custom dataset. btw pickle normally accepts
None
to be saved