Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hey Everyone, when I was actively developing in Kedro over the past year, I had this situation where I needed to load a dataset, and overwrite the changes to that same dataset. The overwriting was solved by creating a new catalog entry, however, the main issue was that when the pipeline runs for the first time, there was no file at that path and Kedro would fail by throwing a `FileNotFound` error. I worked around this issue by adding an empty CSV file and checking if it had content. I was wondering if it is possible to mark an entry in the catalog as an optional dataset, and the exception could be handled in the dataset logic?

What would be the correct behavior if files doesn't exist at all? Where should it load the data from

There could be an empty MemoryDataSet (or just a plain False/None) which is generated, however, the logic to handle missing datasets must be explicitly defined in the node. I realise that this can easily work with Pandas but not that well with delta or spark, but it was an idea which I had.