Please correct me if I’m wrong but it looks like K...
# questions
Please correct me if I’m wrong but it looks like Kedro’s implementation has slightly overlooked input dataset as a differentiating factor for an experiment. That is, Kedro doesn’t consider a different input dataset as a different session/experiment run.
do you mean, in the context of Kedro Viz Experiment Tracking?
Yes for example. How do you store two experiments, same code but different data, side-by-side?
This is a simple execution/flow question. The second question is of methodology, how do you compare them?
since the metrics and json outputs can be versioned, every time you do
kedro run
the tracking outputs you've defined will be saved in a different directory, that will be named using the timestamp of the run: so the question is, how to identify which input dataset was used for each, am I right?
Yes, and how to easily specify input datasets without modifying files, e.g.
kedro run --name=my_first_exp --input-dataset=my_first_dataset.parquet
kedro run --name=my_second_exp --input-dataset=my_second_dataset.parquet
kedro viz
something along these lines..
modular pipelines allow you to reuse the same pipeline structure for different inputs: you could then designate different pipelines that run for different inputs already defined in your catalog. this is similar to a question that got asked a few days ago
also, @Ofir you might want to use
kedro run --from-inputs
Thanks a lot!