Hugo Evers
02/27/2023, 10:17 AMparameters.yml
ParameterGrid:
name_of_parameter:
version_1:
- value1
- value2
version_2:
- value1
etc.
and now i could run through these options with the namespace.
However, now i need to have dataset entries in the catalog.yml which match these version_1
and version_2
names.
Since i dont want these to be stored in memory and than destroyed.
Instead i want to use the kedro_mlflow datasets.
so for example for the parquet files i would use something like:
X_test_{{ split_crit }}:
type: kedro_mlflow.io.artifacts.MlflowArtifactDataSet
data_set:
type: pandas.ParquetDataSet
filepath: <s3://sagemaker-vertex/data/05_model_input/X_test_{{> split_crit }}.parquet
and for the metrics:
my_model_metrics_{{ split_crit }}:
type: kedro_mlflow.io.metrics.MlflowMetricDataSet
key: accuracy
and for the models
multi_modal_model:
type: kedro_mlflow.io.models.MlflowModelLoggerDataSet
flavor: mlflow.pyfunc
pyfunc_workflow: python_model
save_args:
conda_env:
python: "3.9.10"
dependencies:
- "mlflow==1.27.0"
However, in kedro these output datasets cannot be shared (even though in mlflow this would be fine)datajoely
02/27/2023, 10:22 AMHugo Evers
02/27/2023, 10:24 AM_multi_modal_model: &multi_modal_model
type: kedro_mlflow.io.models.MlflowModelLoggerDataSet
flavor: mlflow.pyfunc
pyfunc_workflow: python_model
save_args:
conda_env:
python: "3.9.10"
dependencies:
- "mlflow==1.27.0"
would be greatdatajoely
02/27/2023, 12:31 PMbefore_pipeline_run
hook gives you everything you need to do to modify thisbefore_pipeline_run
hook gives you everything you need to do to modify this