Hi Team We are using databricks setup for running kedro pipe Kedro #questions

Hi Team, We are using databricks setup for running...

Ankit Kansal

11/27/2023, 2:42 PM

Hi Team, We are using databricks setup for running kedro pipelines and using databricks.ManagedTable** dataset using Workflow 1. We develop the code in notebooks - run it test it. 2. We have a little tweak of the ManagedTable class to accept file path as only external tables are allowed 3. Deploy the code through azure pipelines --

facing challenges here

a. The code fails to read the source tables using the databricks.ManagedTable Can someone help, it looks like either a cluster version issue or some of the package discrepencies

Ankit Kansal

11/27/2023, 5:54 PM

@Juan Cruz

Ankit Kansal

11/27/2023, 5:55 PM

I know the issue, we just need to update the databricks.ManagedTable** dataset and make it accept filepath variable. Now i have done the above but it is working in one workspace while not working in another In another it is giving import error

Ankit Kansal

11/27/2023, 10:50 PM

error screenshot

Juan Luis

11/28/2023, 8:08 AM

hi @Ankit Kansal! what happens if you do

Copy code

from claimscostutilities.kedro_datasets.custom_managed_table_dataset import ManagedTableDatasets

? (I expect an ImportError of sorts, this will give you a hint at what's happening)

Ankit Kansal

11/28/2023, 8:32 AM

Hi @Juan Luis thanks for checking, the above import works without giving error

Ankit Kansal

11/28/2023, 8:33 AM

Juan Luis

11/28/2023, 8:36 AM

got it. how are you loading Kedro in the first case?

%load_ext kedro.ipython

or something else?

Ankit Kansal

11/28/2023, 9:54 AM

This is my requirements.txt

Ankit Kansal

11/28/2023, 9:55 AM

👀 1

Ankit Kansal

11/28/2023, 10:06 AM

1. The same code works when you run the databricks git workspace. 2. The code fails with above exception when run in databricks workspace folder.

Juan Luis

11/28/2023, 10:22 AM

thanks @Ankit Kansal, that's helpful. this tells me that the databricks notebook probably has some missing dependency somehow. In your https://kedro-org.slack.com/archives/C03RKP2LW64/p1701165302683509?thread_ts=1701096140.614109&cid=C03RKP2LW64 screenshot, when you do

session.run(pipeline_name="...")

, is when you see the dataset error?

3 Views

Open in Slack

Previous Next