Flavien
09/29/2023, 2:23 PMkedro
. We are using kedro
on databricks
using the ManagedTableDataSet
. Everything was working smoothly until yesterday when we decided to change the name of the table were upserting into.
Inside the code of ManagedTableDataSet
for upsert, there is this piece of code
self._get_spark().conf.set(
"fullTableAddress", self._table.full_table_location()
)
self._get_spark().conf.set("whereExpr", where_expr)
upsert_sql = """MERGE INTO ${fullTableAddress} base USING update ON ${whereExpr}
WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *"""
self._get_spark().sql(upsert_sql)
Using the configuration from the Spark session, the full path to the table is injected (I would be interested to know why it was used this way instead of inserting the string, maybe some protection?).
It turns out that, in our case, when the table name contains hourl
, databricks
obfuscates the name and it breaks the request. Here is a minimal example which we could reproduce the problem with
full_table_location = "`silver`.`weather`.`heck_hourl`"
spark.conf.set("a", full_table_location)
spark.sql("""SELECT * FROM ${a} LIMIT 1""")
resulting in
ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near '*'.(line 1, pos 14)
== SQL ==
SELECT * FROM *********(redacted) LIMIT 1
--------------^^^
instead of indicating that the table does not exist.
I don't know if there is something in our configuration which would trigger such a weird behavior. If any of you has encountered this unexpected result, please let me know. If it can be reproduced by someone else, it would be a relief and we may have to implement something for mitigating this in ManagedTableDataSet
.
Thanks for your feedback. 😅datajoely
09/29/2023, 2:25 PM