https://kedro.org/ logo
#questions
Title
# questions
c

Clement

01/15/2024, 4:45 PM
Hello, is there resources running kedro with dask and google big query ? I have a pandas.GBQTableDataset catalog and when i run kedro run --runner=fdr_explorer_backend.runner.DaskRunner it fails with this error : PicklingError: Pickling client objects is explicitly not supported. Clients have non-trivial state that is local and unpickleable. It comes from the google packages I believe :
Copy code
- gcsfs=2023.12.2.post1
- google-cloud-bigquery=3.14.1
- pandas-gbq=0.19.2
Any ideas ?
j

Juan Luis

01/15/2024, 4:50 PM
hey @Clement, could you share the complete traceback?
and also the Kedro and kedro-datasets versions you have
c

Clement

01/15/2024, 4:54 PM
Sure, in attached files my logs and environment.yml
environment.yml
Hi @Juan Luis, sorry to bring this up again. Do you have any idea about how to serialize the client ? I have still trouble figuring this out
n

Nok Lam Chan

01/24/2024, 4:11 PM
i don't think you can serialise in this case.
Cc @Deepyaman Datta Do you think this is related to the copy_mode discussion we had?
d

Deepyaman Datta

01/24/2024, 5:35 PM
Hmm... it's been a long time since I've worked with Kedro and Dask, but what are you passing between nodes? If you're following the Dask deployment guide, I believe it would be a custom
_DaskDataset
by default. Also, is this pickling of Google Cloud client, or Dask client? I think Google Cloud from the traceback.
Cc @Deepyaman Datta Do you think this is related to the copy_mode discussion we had?
Probably; if you're passing something that depends on Google Cloud client, it will probably be lazy + with
copy_mode='assign'
.