Hello is there resources running kedro with dask and google Kedro #questions

Hello, is there resources running kedro with dask ...

Clement

01/15/2024, 4:45 PM

Hello, is there resources running kedro with dask and google big query ? I have a pandas.GBQTableDataset catalog and when i run kedro run --runner=fdr_explorer_backend.runner.DaskRunner it fails with this error : PicklingError: Pickling client objects is explicitly not supported. Clients have non-trivial state that is local and unpickleable. It comes from the google packages I believe :

Copy code

- gcsfs=2023.12.2.post1
- google-cloud-bigquery=3.14.1
- pandas-gbq=0.19.2

Any ideas ?

Juan Luis

01/15/2024, 4:50 PM

hey @Clement, could you share the complete traceback?

Juan Luis

01/15/2024, 4:50 PM

and also the Kedro and kedro-datasets versions you have

Clement

01/15/2024, 4:54 PM

Sure, in attached files my logs and environment.yml

Untitled

Clement

01/15/2024, 4:54 PM

environment.yml

Clement

01/22/2024, 8:47 AM

Hi @Juan Luis, sorry to bring this up again. Do you have any idea about how to serialize the client ? I have still trouble figuring this out

Nok Lam Chan

01/24/2024, 4:11 PM

i don't think you can serialise in this case.

Nok Lam Chan

01/24/2024, 4:13 PM

Cc @Deepyaman Datta Do you think this is related to the copy_mode discussion we had?

Deepyaman Datta

01/24/2024, 5:35 PM

Hmm... it's been a long time since I've worked with Kedro and Dask, but what are you passing between nodes? If you're following the Dask deployment guide, I believe it would be a custom

_DaskDataset

by default. Also, is this pickling of Google Cloud client, or Dask client? I think Google Cloud from the traceback.

Cc @Deepyaman Datta Do you think this is related to the copy_mode discussion we had?

Probably; if you're passing something that depends on Google Cloud client, it will probably be lazy + with

copy_mode='assign'

Open in Slack

Previous Next