Hello, is there resources running kedro with dask ...
# questions
c
Hello, is there resources running kedro with dask and google big query ? I have a pandas.GBQTableDataset catalog and when i run kedro run --runner=fdr_explorer_backend.runner.DaskRunner it fails with this error : PicklingError: Pickling client objects is explicitly not supported. Clients have non-trivial state that is local and unpickleable. It comes from the google packages I believe :
Copy code
- gcsfs=2023.12.2.post1
- google-cloud-bigquery=3.14.1
- pandas-gbq=0.19.2
Any ideas ?
j
hey @Clement, could you share the complete traceback?
and also the Kedro and kedro-datasets versions you have
c
Sure, in attached files my logs and environment.yml
environment.yml
Hi @Juan Luis, sorry to bring this up again. Do you have any idea about how to serialize the client ? I have still trouble figuring this out
n
i don't think you can serialise in this case.
Cc @Deepyaman Datta Do you think this is related to the copy_mode discussion we had?
d
Hmm... it's been a long time since I've worked with Kedro and Dask, but what are you passing between nodes? If you're following the Dask deployment guide, I believe it would be a custom
_DaskDataset
by default. Also, is this pickling of Google Cloud client, or Dask client? I think Google Cloud from the traceback.
Cc @Deepyaman Datta Do you think this is related to the copy_mode discussion we had?
Probably; if you're passing something that depends on Google Cloud client, it will probably be lazy + with
copy_mode='assign'
.