Mohamed El Guendouz
01/20/2025, 6:41 PMspark.DeltaTableDataset
, I receive a message stating that the table does not exist. This is expected since the table hasn't been created yet. However, my goal is to initialize the table with data that I will subsequently provide.
Unfortunately, the DeltaTableDataset
does not support write operations. Does anyone know how to handle the initialization of a Delta table in this scenario?
Currently, I am working on a custom hook using the @hook_impl
decorator:
@hook_impl
def before_dataset_loaded(self, dataset_name: str, node: Node) -> None:
# My logic to initialize the Delta table
The idea is to initialize the Delta table (if it doesn’t already exist) using PySpark within this hook. However, I am struggling to dynamically retrieve the schema of the table for its creation.
If anyone has encountered a similar situation or has insights on how to resolve this, I would greatly appreciate your help!
Thank you in advance for your support!Hall
01/20/2025, 6:41 PMHuong Nguyen
01/21/2025, 9:37 AMDeltaTable.is_deltatable()
before when working with the delta table.
https://github.com/delta-io/delta-rs/pull/2715Mohamed El Guendouz
01/24/2025, 7:58 PM