Hi Team, We are using databricks setup for running...
# questions
a
Hi Team, We are using databricks setup for running kedro pipelines and using databricks.ManagedTable** dataset using Workflow 1. We develop the code in notebooks - run it test it. 2. We have a little tweak of the ManagedTable class to accept file path as only external tables are allowed 3. Deploy the code through azure pipelines --
facing challenges here
a. The code fails to read the source tables using the databricks.ManagedTable Can someone help, it looks like either a cluster version issue or some of the package discrepencies
@Juan Cruz
I know the issue, we just need to update the databricks.ManagedTable** dataset and make it accept filepath variable. Now i have done the above but it is working in one workspace while not working in another In another it is giving import error
error screenshot
j
hi @Ankit Kansal! what happens if you do
Copy code
from claimscostutilities.kedro_datasets.custom_managed_table_dataset import ManagedTableDatasets
? (I expect an ImportError of sorts, this will give you a hint at what's happening)
a
Hi @Juan Luis thanks for checking, the above import works without giving error
image.png
j
got it. how are you loading Kedro in the first case?
%load_ext kedro.ipython
or something else?
a
This is my requirements.txt
image.png
👀 1
1. The same code works when you run the databricks git workspace. 2. The code fails with above exception when run in databricks workspace folder.
j
thanks @Ankit Kansal, that's helpful. this tells me that the databricks notebook probably has some missing dependency somehow. In your https://kedro-org.slack.com/archives/C03RKP2LW64/p1701165302683509?thread_ts=1701096140.614109&cid=C03RKP2LW64 screenshot, when you do
session.run(pipeline_name="...")
, is when you see the dataset error?