Hugo Evers
07/03/2023, 3:46 PMHugo Evers
07/03/2023, 3:48 PMHugo Evers
07/03/2023, 3:48 PMHugo Evers
07/03/2023, 3:49 PMHugo Evers
07/03/2023, 3:49 PMNok Lam Chan
07/03/2023, 3:50 PMHugo Evers
07/03/2023, 3:50 PMHugo Evers
07/03/2023, 3:50 PMNok Lam Chan
07/03/2023, 3:50 PM.
notation to detect if something is a namespace pipelineNok Lam Chan
07/03/2023, 3:50 PMHugo Evers
07/03/2023, 3:50 PMHugo Evers
07/03/2023, 3:51 PMHugo Evers
07/03/2023, 3:51 PMHugo Evers
07/03/2023, 3:51 PMNok Lam Chan
07/03/2023, 3:51 PMNok Lam Chan
07/03/2023, 3:51 PMHugo Evers
07/03/2023, 3:51 PMHugo Evers
07/03/2023, 3:52 PMNok Lam Chan
07/03/2023, 3:52 PMHugo Evers
07/03/2023, 3:52 PMfull_pipeline = train_test_pipeline + train_pipeline + evaluation_pipeline
finnish_pipeline = modular_pipeline(
pipe=full_pipeline,
inputs={"filtered_n_validated_data": "finnish_jobs"},
outputs={},
namespace="finnish",
)
Hugo Evers
07/03/2023, 3:52 PMHugo Evers
07/03/2023, 3:52 PMNok Lam Chan
07/03/2023, 3:53 PMso train and test are both memorydatasetsCan you show the pipeline for this?
Hugo Evers
07/03/2023, 3:53 PMevaluation_pipeline = modular_pipeline(
pipe=(
modular_pipeline(
pipe=create_base_pipeline(inference_tag=True, **kwargs),
inputs={"X_n_y": "test"},
# namespace="train"
)
+ modular_pipeline(
pipe=test_model_pipeline,
inputs={
"test": "hf_dataset", # or: train.hf_dataset
# 'test':'test.hf_dataset',
},
)
),
inputs={
"finetuned_pre_trained_isco_classifier": "train.finetuned_pre_trained_isco_classifier",
"test": "test",
"isco_names": "isco_names",
},
namespace="test",
)
Hugo Evers
07/03/2023, 3:55 PMtrain_test_pipeline = create_train_test_pipeline(**kwargs)
train_model_pipeline = create_train_model_pipeline(**kwargs)
test_model_pipeline = create_predict_pipeline(**kwargs)
Hugo Evers
07/03/2023, 3:55 PMtrain_pipeline = modular_pipeline(
pipe=(
modular_pipeline(
pipe=create_base_pipeline(**kwargs),
inputs={"X_n_y": "train"},
# namespace="train"
)
+ modular_pipeline(
pipe=train_model_pipeline,
inputs={
"train": "hf_dataset", # or: train.hf_dataset
},
)
),
inputs={
"train": "train",
"test": "test",
"isco_names": "isco_names",
},
namespace="train",
)
Hugo Evers
07/03/2023, 3:57 PMdef create_train_test_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=isco_balance_train_test_split,
inputs={
"df": "filtered_n_validated_data",
"sample_size_frac": "params:sample_size_frac",
**kwargs,
},
outputs=["train", "test"],
name="isco_balanced_train_test_split",
),
]
) # type: ignore
Hugo Evers
07/03/2023, 3:58 PMNok Lam Chan
07/03/2023, 4:01 PMinputs={
"finetuned_pre_trained_isco_classifier": "train.finetuned_pre_trained_isco_classifier",
"test": "test",
"isco_names": "isco_names",
},
So you did define test
as a dataset?Hugo Evers
07/03/2023, 4:03 PMHugo Evers
07/03/2023, 4:03 PMNok Lam Chan
07/03/2023, 4:04 PMNok Lam Chan
07/03/2023, 4:04 PMkedro catalog list
will help to debugHugo Evers
07/03/2023, 4:05 PMHugo Evers
07/03/2023, 4:05 PMHugo Evers
07/03/2023, 4:05 PMHugo Evers
07/03/2023, 4:06 PMHugo Evers
07/03/2023, 4:06 PMHugo Evers
07/03/2023, 4:06 PMHugo Evers
07/03/2023, 4:06 PMHugo Evers
07/03/2023, 4:06 PMHugo Evers
07/03/2023, 4:07 PMNok Lam Chan
07/03/2023, 4:07 PMNok Lam Chan
07/03/2023, 4:08 PMfinnish.train
and finnish.test
?Hugo Evers
07/03/2023, 4:09 PMHugo Evers
07/03/2023, 4:09 PMHugo Evers
07/03/2023, 4:09 PMHugo Evers
07/03/2023, 4:10 PMNok Lam Chan
07/03/2023, 4:10 PMfinnish.train
and finnish.test
?Hugo Evers
07/03/2023, 4:10 PMHugo Evers
07/03/2023, 4:15 PMHugo Evers
07/03/2023, 4:15 PMHugo Evers
07/03/2023, 4:16 PMHugo Evers
07/03/2023, 4:16 PMHugo Evers
07/04/2023, 7:36 AMNok Lam Chan
07/04/2023, 8:38 AMHugo Evers
07/04/2023, 8:53 AMRashida Kanchwala
07/04/2023, 10:19 AMRashida Kanchwala
07/04/2023, 10:21 AMNok Lam Chan
07/04/2023, 10:26 AMNero Okwa
07/06/2023, 9:25 AM