hello everyone :slightly_smiling_face: after a qui...
# questions
m
hello everyone 🙂 after a quick search I don't think anyone had this issue. I attempt to download large data from bigquery as parquet files. This takes a long time and ends up with an error 403 (results too large). I tried with the pandas.GBQuery and the pandas.GBTable types and I tried to refine the query. What would be your advice in this case please ? Thank you very much !
m
If you really need to export it to parquet, just use export job https://cloud.google.com/bigquery/docs/exporting-data#export-data-in-bigquery
m
In fact, it is a requirement to do it via kedro so I was was wondering how to do it in this case 🥲. Otherwise of course I would do it directly on GCP
m
You can create node in Kedro that will do the export 🙂
Copy code
# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = "bigquery-public-data"
dataset_id = "samples"
table_id = "shakespeare"

destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.csv")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location="US",
)  # API request
extract_job.result()  # Waits for job to complete.

print(
    "Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)
or even custom Kedro Dataset
m
Ok thank you for your answer