https://kedro.org/ logo
#questions
Title
# questions
n

Nandha Kumar

01/28/2024, 1:20 PM
Hey Everyone, when I was actively developing in Kedro over the past year, I had this situation where I needed to load a dataset, and overwrite the changes to that same dataset. The overwriting was solved by creating a new catalog entry, however, the main issue was that when the pipeline runs for the first time, there was no file at that path and Kedro would fail by throwing a
FileNotFound
error. I worked around this issue by adding an empty CSV file and checking if it had content. I was wondering if it is possible to mark an entry in the catalog as an optional dataset, and the exception could be handled in the dataset logic?
n

Nok Lam Chan

01/28/2024, 1:25 PM
What would be the correct behavior if files doesn't exist at all? Where should it load the data from
n

Nandha Kumar

01/28/2024, 1:32 PM
There could be an empty MemoryDataSet (or just a plain False/None) which is generated, however, the logic to handle missing datasets must be explicitly defined in the node. I realise that this can easily work with Pandas but not that well with delta or spark, but it was an idea which I had.
2 Views