Danhua Yan
01/06/2023, 5:03 PMpandas.ParquetDataSet spark.SparkDataSet pickle.PickleDataSet , and using yml configs to save:
dataset:
type: pandas.ParquetDataSet
filepath: some_path
versioned: trueDeepyaman Datta
01/06/2023, 5:08 PMDeepyaman Datta
01/06/2023, 5:10 PMDanhua Yan
01/06/2023, 5:42 PM{timestamp}_conf_v1 , {timestamp}_conf_v2 etc. so it’s easier to analyze the output. I know MLflow could probably do this but want to see if there’s an option without changing the source code.Deepyaman Datta
01/06/2023, 6:07 PMDeepyaman Datta
01/06/2023, 6:11 PMDanhua Yan
01/06/2023, 6:13 PMElias WILLEMSE
01/08/2023, 11:50 AMPartitionedDataSet (link below)?
You can then generate a key, and it will save accordingly.
PS. We’ve also been looking into this use case.
https://kedro.readthedocs.io/en/stable/kedro.io.PartitionedDataSet.html