hello everyone slightly smiling face after a quick search I Kedro #questions

hello everyone :slightly_smiling_face: after a qui...

Melanie

09/29/2023, 12:32 PM

hello everyone 🙂 after a quick search I don't think anyone had this issue. I attempt to download large data from bigquery as parquet files. This takes a long time and ends up with an error 403 (results too large). I tried with the pandas.GBQuery and the pandas.GBTable types and I tried to refine the query. What would be your advice in this case please ? Thank you very much !

marrrcin

09/29/2023, 12:39 PM

If you really need to export it to parquet, just use export job https://cloud.google.com/bigquery/docs/exporting-data#export-data-in-bigquery

Melanie

09/29/2023, 12:40 PM

In fact, it is a requirement to do it via kedro so I was was wondering how to do it in this case 🥲. Otherwise of course I would do it directly on GCP

marrrcin

09/29/2023, 12:40 PM

You can create node in Kedro that will do the export 🙂

marrrcin

09/29/2023, 12:40 PM

Copy code

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = "bigquery-public-data"
dataset_id = "samples"
table_id = "shakespeare"

destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.csv")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location="US",
)  # API request
extract_job.result()  # Waits for job to complete.

print(
    "Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)

or even custom Kedro Dataset

Melanie

09/29/2023, 12:49 PM

Ok thank you for your answer

Open in Slack

Previous Next