simon freyburger
09/29/2023, 7:59 AMdatajoely
09/29/2023, 8:40 AMKedroContext
you’ve gone too far.Nok Lam Chan
09/29/2023, 10:09 AMsimon freyburger
09/29/2023, 10:38 AMkedro run --indicator 3
, that would do a "SELECT * FROM TABLE WHERE INDICATOR = 3", then would save to my_dataset_3.csv, and then would train the model n°3 and save it under my_model_3.pkl. I have ~1000 indicators, want to do one model for each, so no real way to do it manually.
• Second, I would like to do kedro viz --indicator 3
in order for kedro_viz to display my pipeline, with e.g. dataset statistics, metrics, etc. related to my model_3.pkl.
For now, best shot at the first part consists in building a hook reading from a config :
class DataCatalogHooks:
@property
def _logger(self):
return logging.getLogger(self.__class__.__name__)
@hook_impl
def after_catalog_created(self, catalog: DataCatalog) -> None:
config = load_config(path_of_config_containing_indicator)
catalog.add("indicator", MemoryDataSet(data=config.indicator))
having this pipeline :
def create_pipeline() -> dict:
pipeline = Pipeline(
[
node(
raw,
inputs=[
f"indicator",
],
outputs=f"raw_train_data",
),
node(
func=preprocess,
inputs=["indicator", "raw_train_data"],
outputs="preprocess_train_data",
),
]
)
return pipeline
if __name__ == '__main__':
pipelines = create_pipeline()
and do another program that basically for current_indicator in [1, 1000], update_config to have indicator: current_indicator, and do a sys.exe("kedro run").
But i'm a bit lost on how to do the kedro viz part.Nok Lam Chan
09/29/2023, 11:02 AMsimon freyburger
09/29/2023, 11:10 AMdef create_pipeline() -> dict:
iterator_list = [1, 2]
my_pipeline = sum(
[
pipeline(
[
node(
raw,
inputs=[
f"indicator",
],
outputs=f"raw_train_data#csv",
),
node(
func=preprocess,
inputs=["indicator", "raw_train_data#csv"],
outputs="preprocess_train_data#csv",
),
], namespace=str(iterator)
) for iterator in iterator_list
]
)
return my_pipeline
kedro run
. It will train my pipeline for iterator 1 and 2.
During the visualization, I can set iterator_list = [1]
and run kedro viz
to only display my first pipeline