Danhua Yan
01/06/2023, 5:03 PMpandas.ParquetDataSet
spark.SparkDataSet
pickle.PickleDataSet
, and using yml configs to save:
dataset:
type: pandas.ParquetDataSet
filepath: some_path
versioned: true
Deepyaman Datta
01/06/2023, 5:08 PMDanhua Yan
01/06/2023, 5:42 PM{timestamp}_conf_v1
, {timestamp}_conf_v2
etc. so it’s easier to analyze the output. I know MLflow could probably do this but want to see if there’s an option without changing the source code.Deepyaman Datta
01/06/2023, 6:07 PMDanhua Yan
01/06/2023, 6:13 PMElias WILLEMSE
01/08/2023, 11:50 AMPartitionedDataSet
(link below)?
You can then generate a key, and it will save accordingly.
PS. We’ve also been looking into this use case.
https://kedro.readthedocs.io/en/stable/kedro.io.PartitionedDataSet.html