https://kedro.org/ logo
#questions
Title
# questions
e

Emilie Gourmelen

12/22/2023, 8:25 AM
Hello everyone, I'm trying to construct a node in kedro which takes as input a GBQTableDataset , which is rather large (few millions of rows and 150 columns ), loads it as a dataframe and execute some pandas/sklearn operations on it. My problem is that the BigQuery table is too large and the loading fails . What would you suggest to use? I was thinking of creating a custom dataset but that will re-use the GBQTableDataset code , adapting the loading part. But I'm not exactly sure how. Thanks in advance for your guidance on this topic.
d

Dmitry Sorokin

12/22/2023, 9:51 AM
Hi Emilie, have you tried https://pola.rs/?
👍 1
d

datajoely

12/22/2023, 10:39 AM
Yeah I would try chunking, polars or spark if you’re hitting a wall
👍 3
e

Emilie Gourmelen

12/22/2023, 4:05 PM
Hello, Thank you for your replies. I will look at those options.