I am about to embark on training a variety of Bayesian model Kedro #questions

I am about to embark on training a variety of Baye...

Galen Seilis

08/26/2023, 5:51 AM

I am about to embark on training a variety of Bayesian models on a particular data set while trying to follow the guidance given in Bayesian workflow: https://arxiv.org/abs/2011.01808 One aspect of Bayesian workflow is to start with training simple models and work up towards more complicated models. Does anyone have thoughts/opinions on how to structure a Kedro project following Bayesian workflow? Currently I have a data processing pipeline which does some minor processing (e.g.

pd.merge

on some of the separate data sources in the catalog) and an EDA pipeline that makes a datapane report (https://datapane.com/). When it comes to training multiple models iteratively, does it sound reasonable to have a pipeline per model? This isn't like a grid search or other hyperparameter tuning exercise (e.g. via https://optuna.org/) but rather each model is a thoughtful exercise based on the previously-trained models. Given that this is a manual, by design, approach for this particular collection of models, would you have separate pipelines or just separate with tags or have distinct models on different git branches or have separate kedro projects? Or something else? I'm sure there are many good and bad ideas. What would you do (and why, if you feel you can explain why)?

Galen Seilis

08/26/2023, 5:52 AM

I don't know that it matters, but for this project I am developing these Bayesian models using PyMC 5.

Juan Luis

08/26/2023, 9:12 AM

my gut feeling is that I'd go for the simplest approach, which in my mind is: one pipeline per model (since every model might be different) but all of them in a single Kedro project (because (1) probably they won't be that different and (2) conceptually they're part of the same workflow)

👍 1

Juan Luis

08/26/2023, 9:12 AM

switching git branches sounds like too much work for me

👍 1

Juan Luis

08/26/2023, 9:13 AM

and then if you want to separate them by tags it's up to you, but that's more useful when you want to run several Kedro pipelines at once. for example "all models that use x technique"

👍 1

Galen Seilis

08/26/2023, 7:48 PM

@Juan Luis Great, thank you. I found that feedback useful.

K 1

3 Views

Open in Slack

Previous Next