Hi all! I am working with a clustering pipeline t...
# questions
t
Hi all! I am working with a clustering pipeline that I regularly want to rerun to monitor cluster migrations. I am using SnowflakeTableDatasets to save data directly to the data warehouse. Now, since it is not possible to have the same input and output dataset in Kedro, I was wondering what would be best practice to rerun clustering and store to the same SnowparkTableDataset when storing on a different timestamp for example. Would appreciate your help here!
👀 1
r
Hi @Thomas d'Hooghe, From your use case, I found PartitionedDataset and IncrementalDataset to be helpful. If you haven't tried already, please check the docs here - https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html . Also, if your clustering pipeline runs the entire dataset and you want to work with different versions, you can try versioning in catalog. Thank you
t
Hi Ravi, thank you for your quick response! That looks promising indeed. Any chance you have ever tested this to work with a SnowparkTableDataset already? Responding to the versioning, I thought with or without versioning, it is not possible to have the same dataset as both input and output. Are you saying that with versioning this constraint is relieved?
r
Oh yes, I think kedro does not allow same datasets to be both inputs and outputs. I haven't tried incremental datasets before. Also I was wondering if I understood your question - 1. You have a pipeline which has a node that takes in dataset x -> dataset x ? or 2. You have a pipeline which has a node that takes in dataset x -> dataset x_with_timestamp ? and then the next iteration would take dataset x_with_timestamp as input
t
I think both would work, but the latter one would be a bit more clean. Also am wondering what the community thinks what the best solution will be in this case :)
m
You cannot have the same input and output datasets but 2 differently named data catalog entries can point to the same underlying resource (file/database etc)
🙌 1
thankyou 2
🙏 1