Mohamed El Guendouz
02/26/2025, 1:59 PMspark.Dataset
, but I'm getting an error saying that I need to configure the project ID. Has anyone encountered this issue before?
Spark Session :
spark.jars.packages: io.delta:delta-spark_2.12:3.2.0
spark.jars: <https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar,https://repo1.maven.org/maven2/com/google/cloud/spark/spark-bigquery-with-dependencies_2.12/0.36.1/spark-bigquery-with-dependencies_2.12-0.36.1.jar>
spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.hadoop.fs.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
Error :
DatasetError: Failed while loading data from dataset SparkDataset(file_format=bigquery, filepath=/tmp/dummy.parquet, load_args={'table': project_id.dataset_id.table_id}, save_args={}).
An error occurred while calling o45.load.
: com.google.cloud.spark.bigquery.repackaged.com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) [Guice/ErrorInCustomProvider]: IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment. Please set a project ID using the
builder.
at SparkBigQueryConnectorModule.provideSparkBigQueryConfig(SparkBigQueryConnectorModule.java:102)
while locating SparkBigQueryConfig
Learn more:
<https://github.com/google/guice/wiki/ERROR_IN_CUSTOM_PROVIDER>
1 error
======================
Full classname legend:
======================
SparkBigQueryConfig: "com.google.cloud.spark.bigquery.SparkBigQueryConfig"
SparkBigQueryConnectorModule: "com.google.cloud.spark.bigquery.SparkBigQueryConnectorModule"
========================
End of classname legend:
========================
Hall
02/26/2025, 1:59 PMLaura Couto
02/26/2025, 2:36 PMspark.yml
file in your Kedro project. How are you passing the project ID to the Spark session?Mohamed El Guendouz
02/27/2025, 9:19 AMtable_name:
type: spark.SparkDataset
file_format: bigquery
filepath: "/tmp/dummy.parquet"
load_args:
table: "project_id.dataset_id.table_id"
Laura Couto
02/27/2025, 12:38 PMMohamed El Guendouz
02/27/2025, 2:19 PMcom.google.cloud.spark.bigquery.SparkBigQueryConfig
⢠com.google.cloud.spark.bigquery.SparkBigQueryConnectorModule
Laura Couto
02/27/2025, 2:23 PMMohamed El Guendouz
02/27/2025, 2:32 PMspark.jars.packages: io.delta:delta-spark_2.12:3.2.0
spark.jars: <https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar,https://repo1.maven.org/maven2/com/google/cloud/spark/spark-bigquery-with-dependencies_2.12/0.36.1/spark-bigquery-with-dependencies_2.12-0.36.1.jar>
spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.hadoop.fs.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
Mohamed El Guendouz
02/27/2025, 2:33 PMNok Lam Chan
02/27/2025, 2:42 PMLaura Couto
02/27/2025, 2:42 PM.config('parentProject', 'google-project-ID')
Mohamed El Guendouz
03/03/2025, 2:25 PMtable_name:
type: spark.SparkDataset
file_format: bigquery
filepath: "tmp/dummy.parquet"
load_args:
dataset: "project_id"
table: "project_id.dataset_id.table_name"
I also verified that the service account (SA) had the following roles:
⢠Storage Object Viewer
⢠BigQuery Data Viewer
⢠BigQuery Read Session User
After properly configuring the credentials and without altering the Spark configuration I shared with you, I was able to read the BigQuery table successfully.Nok Lam Chan
03/03/2025, 2:25 PMNok Lam Chan
03/03/2025, 2:26 PMMohamed El Guendouz
03/03/2025, 2:31 PMdataset
parameter in the load_args
. I believe it would indeed be useful to include this configuration in the documentation.