Hey all, we're currently running MLflow experiment...
# questions
c
Hey all, we're currently running MLflow experiments using a Kedro pipeline. The pipeline produces intermediate datasets. I'd like to run multiple experiments concurrently while avoiding file collisions. What is the best approach for doing this in Kedro? Does someone know whether we can refer to
params
in the
catalog..yml
in order to make the paths dynamic?
e
you could use Jinja + namespaces
e
following to learn ๐Ÿ™‚ not sure how jinja + namespaces work.
c
Hey @Erwin do you have an example of using Jinja in the catalog by chance?
d
Are you using the
kedro-mlflow
plug-in?
and I think hooks are the best way to do this if youโ€™re not
l
@datajoely small typo but surely you meant
kedro-mlflow
plugin๐Ÿ˜‰
๐Ÿ‘ 1
Regarding referring to
params
to make paths dynamic, using namespaces: few days ago a similar question was asked and in the reply thread we might have a possible implementation for you https://kedro-org.slack.com/archives/C03RKP2LW64/p1692872422344789 Without any additional work, no you cannot refer to
params
in the
catalog.yml
at the moment. -- However, assuming you are indeed using the
kedro-mlflow
plugin and logging mlflow artifacts using the
<http://kedro_mlflow.io|kedro_mlflow.io>.artifacts.MlflowArtifactDataSet
- then I wouldn't think you need to refer to your params.yml to make the filepath dynamic? Since the dataset would be logged as an artifacts to the mlflow run with the params you need? ๐Ÿค” If you share some more about how you are setting up your concurrent runs, I suspect you could get away with just using
namespaces
by following the example in the docs here: https://docs.kedro.org/en/stable/data/data_catalog.html#example-3-generalise-datasets-using-namespaces-into-one-dataset-factory
d
yes kedro-mlflow facepalming