Mohamed El Guendouz
10/17/2024, 2:10 PMtable_name:
type: spark.DeltaTableDataset
filepath: "<gs://XXXX/poc-kedro/table_name/*.parquet>"
Could you tell me what might be wrong with this?
Additionally, could you explain how to specify the credentials for accessing the table with this Dataset?Ravi Kumar Pilla
10/17/2024, 2:18 PMMohamed El Guendouz
10/17/2024, 2:22 PMdelta.tables
library. This leads to a DatasetError
in Kedro, preventing the data from being loaded successfully.Mohamed El Guendouz
10/17/2024, 2:23 PMFile "/opt/anaconda3/lib/python3.11/site-packages/kedro/io/core.py", line 202, in load
raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed while loading data from data set DeltaTableDataset(filepath=XXXXXXX/poc-kedro/table_name/*.parquet, fs_prefix=gs://).
'JavaPackage' object is not callable
Mohamed El Guendouz
10/17/2024, 2:24 PMMohamed El Guendouz
10/17/2024, 2:24 PMNok Lam Chan
10/17/2024, 2:28 PMgs
instead of gcs
or it's just. a typo?Mohamed El Guendouz
10/17/2024, 2:29 PMgs://
Nok Lam Chan
10/17/2024, 2:29 PMgcs
? not sure if I am missing anything hereMohamed El Guendouz
10/17/2024, 2:31 PMgcs
:
File "/opt/anaconda3/lib/python3.11/site-packages/kedro/io/core.py", line 202, in load
raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed while loading data from data set DeltaTableDataset(filepath=XXXXX/poc-kedro/table_name/*.parquet, fs_prefix=gcs://).
'JavaPackage' object is not callable
Nok Lam Chan
10/17/2024, 2:32 PMNok Lam Chan
10/17/2024, 2:32 PMMohamed El Guendouz
10/17/2024, 2:33 PMspark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true
spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.databricks.delta.properties.defaults.compatibility.symlinkFormatManifest.enabled: true
# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR
Ravi Kumar Pilla
10/17/2024, 2:35 PMRavi Kumar Pilla
10/17/2024, 2:41 PMMohamed El Guendouz
10/17/2024, 2:58 PMspark.driver.maxResultSize: 3g
spark.jars.packages: io.delta:delta-core_2.12:2.0.0
spark.jars: <https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar>
spark.sql.execution.arrow.pyspark.enabled: true
spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.databricks.delta.properties.defaults.compatibility.symlinkFormatManifest.enabled: true
# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR
spark.hadoop.fs.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
spark.hadoop.fs.gs.auth.service.account.enable: true
spark.hadoop.google.cloud.auth.service.account.json.keyfile: XXXX.json
However, it's still not recognizing my table… And GCS isn’t working on my end, but with GS, they are able to fetch the table.Ravi Kumar Pilla
10/17/2024, 3:15 PMgcs://
. I am not sure how gs is working for you. But with gs://
is your issue resolved ?Ravi Kumar Pilla
10/17/2024, 3:23 PMMohamed El Guendouz
10/17/2024, 3:25 PMRavi Kumar Pilla
10/17/2024, 3:29 PMMohamed El Guendouz
10/17/2024, 3:36 PMFile "/opt/anaconda3/lib/python3.11/site-packages/kedro/runner/runner.py", line 494, in _run_node_sequential
inputs[name] = catalog.load(name)
^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/kedro/io/data_catalog.py", line 515, in load
result = dataset.load()
^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/kedro/io/core.py", line 202, in load
raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed while loading data from data set DeltaTableDataset(filepath=XXXXXX/poc-kedro/table_name, fs_prefix=gs://).
`<gs://XXXXXX/poc-kedro/table_name>` is not a Delta table.
Ravi Kumar Pilla
10/17/2024, 3:56 PMweather@delta:
type: spark.DeltaTableDataset
filepath: data/02_intermediate/data.parquet
Mohamed El Guendouz
10/17/2024, 4:02 PMMohamed El Guendouz
10/17/2024, 4:10 PMkedro.io.core.DatasetError: Failed while loading data from data set DeltaTableDataset(filepath=XXXXXX/poc-kedro/table_name/processed_at=XXXXXX/part-00000-0076fd68-4ca3-46f6-982f-e77c539af8a1.c000.snappy.parquet, fs_prefix=gs://). <gs://XXXXXX/poc-kedro/table_name/processed_at=XXXXXX/part-00000-0076fd68-4ca3-46f6-982f-e77c539af8a1.c000.snappy.parquet> is not a Delta table."
Mohamed El Guendouz
10/18/2024, 9:22 AM