Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hello everyone, I'm trying to construct a node in kedro which takes as input a GBQTableDataset , which is rather large (few millions of rows and  150 columns ), loads it as a dataframe and execute some pandas/sklearn operations on it. My problem is that the BigQuery table is too large and the loading fails . What would you suggest to use? I  was thinking of creating a custom dataset but that will re-use the GBQTableDataset code , adapting the loading part. But I'm not exactly sure how. Thanks in advance for your guidance on this topic.

Hi Emilie, have you tried <https://pola.rs/>?

Yeah I would try chunking, polars or spark if you’re hitting a wall

Hello, Thank you for your replies. I will look at those options.