Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hi everyone, is there a way to chache an SQLQueryDataSet so it does not always takes time to fetch the same data everytime the pipeline runs? Thanks in advance.

Between separate pipeline runs - probably there’s not built-in way, but for a single pipeline you can use: <https://docs.kedro.org/en/stable/kedro.io.CachedDataset.html#kedro.io.CachedDataset>

If you’re interested in building functionality to cache the results of `SQLQueryDataSet` e.g. to disk, you can extend this class. Keep in mind that you will expose yourself to all kinds of problems related to cache invalidation then :slightly_smiling_face:

Thank you, will check this out. :relaxed:

Does it come with kedro or do I need to install it manually? cause I'm having problem.
```SQLQueryDataSet.__init__() got an unexpected keyword argument 'layer'.
Dataset '_cached' must only contain arguments valid for the constructor of 'kedro_datasets.pandas.sql_dataset.SQLQueryDataSet'..```

The error is self-explanatory… Don’t use `layer`.