Jannik Wiedenhaupt
12/10/2024, 10:16 PMHall
12/10/2024, 10:16 PMDeepyaman Datta
12/10/2024, 10:19 PMpandas.GBQTableDataset
or BigFrames or something else?Jannik Wiedenhaupt
12/10/2024, 10:24 PMDeepyaman Datta
12/10/2024, 10:33 PMParallelRunner
?Jannik Wiedenhaupt
12/10/2024, 10:34 PMDeepyaman Datta
12/10/2024, 10:36 PMJannik Wiedenhaupt
12/10/2024, 10:44 PMAttributeError: The following tables cannot be used with multiprocessing: [TABLE_NAMES]
Deepyaman Datta
12/10/2024, 11:19 PMpandas.GBQTableDataset
implementation. the bigquery.Client
is constructed in the __init__()
method of the dataset, and that is probably not serializable (something along the lines of https://cloud.google.com/python/docs/reference/dataproc/latest/multiprocessing).
It should be possible to solve this by delaying connection until first use, e.g. in load()
or save()
. The pandas SQL and Ibis datasets all do this.
You can always do something like this yourself by defining a custom dataset. @Ravi Kumar Pilla @Merel @Juan Luis may be able to confirm if this can be squeezed into the imminent 6.0.0 release; I could probably do this tonight or tomorrow.Deepyaman Datta
12/11/2024, 7:24 AMDeepyaman Datta
12/11/2024, 3:51 PMDeepyaman Datta
12/17/2024, 3:09 PM