Jannik Wiedenhaupt
12/10/2024, 10:16 PMHall
12/10/2024, 10:16 PMDeepyaman Datta
12/10/2024, 10:19 PMpandas.GBQTableDataset or BigFrames or something else?Jannik Wiedenhaupt
12/10/2024, 10:24 PMDeepyaman Datta
12/10/2024, 10:33 PMParallelRunner?Jannik Wiedenhaupt
12/10/2024, 10:34 PMDeepyaman Datta
12/10/2024, 10:36 PMJannik Wiedenhaupt
12/10/2024, 10:44 PMAttributeError: The following tables cannot be used with multiprocessing: [TABLE_NAMES]Deepyaman Datta
12/10/2024, 11:19 PMpandas.GBQTableDataset implementation. the bigquery.Client is constructed in the __init__() method of the dataset, and that is probably not serializable (something along the lines of https://cloud.google.com/python/docs/reference/dataproc/latest/multiprocessing).
It should be possible to solve this by delaying connection until first use, e.g. in load() or save(). The pandas SQL and Ibis datasets all do this.
You can always do something like this yourself by defining a custom dataset. @Ravi Kumar Pilla @Merel @Juan Luis may be able to confirm if this can be squeezed into the imminent 6.0.0 release; I could probably do this tonight or tomorrow.Deepyaman Datta
12/11/2024, 7:24 AMDeepyaman Datta
12/11/2024, 3:51 PMDeepyaman Datta
12/17/2024, 3:09 PM