FIRAS ALAMEDDINE
01/27/2025, 8:53 PMHall
01/27/2025, 8:53 PMJuan Luis
01/28/2025, 11:11 AMBut since I started using Unity Catalog, these paths become relative, and the absolute paths start having hash keys in them.could you clarify this a bit more?
FIRAS ALAMEDDINE
01/28/2025, 2:52 PMingest_dict= {"df1": "dbfs:/Filestore/tables/path/to/df1",
"df2": "dbfs:/Filestore/tables/path/to/df2",
...}
dict_ingest_catalog = {}
for table in ingest_dict:
a_df = SparkDataSet(
filepath=ingest_dict[table],
file_format='parquet',
load_args={"header": True, "inferSchema": True,"nullValue" : "NA" },
save_args={"sep": ",", "header": True, "mode":"overwrite"},
)
dict_ingest_catalog[table] = a_df
full_catalog = DataCatalog(dict_ingest_catalog)
FIRAS ALAMEDDINE
01/28/2025, 2:56 PMdetails = spark.sql(f"DESCRIBE DETAIL {catalog}.{schema}.{tableName}").collect()
location = details[0]['location']
I can get locations for a pipeline's input files. However, since its output files don't exist yet, I cannot get their location and I cannot predefine a format that is similar to location
. Therefore, my data catalog is not completeJuan Luis
01/28/2025, 3:16 PMManagedTableDataset
instead of SparkDataset
?
DataCatalog.from_config(
{
"nyctaxi_trips": {
"type": "databricks.ManagedTableDataset",
"catalog": "samples",
"database": "nyctaxi",
"table": "trips",
}
}
)
(from https://github.com/astrojuanlu/kedro-databricks-demo/blob/main/First%20Steps%20with%20Kedro%20on%20Databricks.ipynb)FIRAS ALAMEDDINE
01/28/2025, 3:26 PMFIRAS ALAMEDDINE
01/28/2025, 6:08 PMdf = spark.read.table(f"{external_catalog}.{external_schema}.{raw_table}")
then modified a bit, then written using
df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.{tableName}")
In order to use a config
that is similar to what you wrote, is it mandatory to write these input files using another way? Maybe something like:
from kedro_dataset.databricks import ManagedTableDataset
dataset = ManagedTableDataset(table=tableName, catalog=catalog, database=schema, write_mode="overwrite")
dataset.save(df)
FIRAS ALAMEDDINE
01/28/2025, 6:38 PMJuan Luis
01/28/2025, 8:54 PM