Rachid Cherqaoui
06/20/2025, 11:21 AM/doc_20250620*_delta.csv
But I noticed that YAML interprets
* as an anchor, and it doesn't seem to behave like a wildcard here.
How can I configure a dataset in catalog.yml to use a wildcard when loading files from an SFTP path (e.g. to only fetch files starting with a certain prefix and ending with _delta.csv)? Is there native support for this kind of pattern in Kedro's SFTPDataSet or do I need to implement a custom dataset?
Any guidance or examples would be super appreciated! šSajid Alam
06/20/2025, 11:45 AMPartitionedDataSet to handle wildcard paths with SFTP.Rachid Cherqaoui
06/20/2025, 12:00 PMPartitionedDataSet might take a long time to load or list all those files. Do you know if there's a way to optimize that, or limit how many files it tries to process at once?Sajid Alam
06/20/2025, 12:23 PMPartitionedDataset lists the directory once, but I don't believe it open any file until your node calls the load-function for that partition.Rachid Cherqaoui
06/20/2025, 12:39 PMJuan Luis
06/20/2025, 3:15 PM/doc_20250620*_delta.csv in double quotes, YAML won't treat * as an anchor (but I'm not sure)