Hi all I am having issues trying to run a Databricks job usi Kedro #questions

Hi all, I am having issues trying to run a Databr...

Juan David Patiño Guerra

12/11/2023, 10:16 AM

Hi all, I am having issues trying to run a Databricks job using Kedro. I am trying to follow this instructions from the docs: https://docs.kedro.org/en/0.18.14/deployment/databricks/databricks_deployment_workflow.html However I get the following error (see attachment): And I have the following folder structure (see attachment): The databricks_run.py file is also packaged in the .whl file that is required. I'm not sure if I'm missing something important. I am using kedro==0.18.7. Good to know that I am creating and calling the job using Databricks Assets Bundles (the replacement of dbx). I don't think this is the reason of the issue. Thanks for your help on this in advance! I can provide more details if needed.

Juan Luis

12/11/2023, 10:20 AM

hello @Juan David Patiño Guerra with that directory structure, assuming your

databricks_run.py

has the contents of the documentation, that import will not work (it would need to be something like

from wine_model_kedro.databricks_run import main

, which hints that maybe your entry point definition is not correct. can you share the relevant part of your

setup.py

and a screenshot of your job definition to make sure that everything is okay?

Juan David Patiño Guerra

12/11/2023, 10:25 AM

Good one. Checking there I see that the entry_point variable is assigned twice (my bad). However I am not sure if just letting the bottom one would make it work. This is the content of the

setup.py

file:

Copy code

from setuptools import find_packages, setup

entry_point = (
    "wine-model-kedro = wine_model_kedro.__main__:main"
)


# get the dependencies and installs
with open("requirements.txt", encoding="utf-8") as f:
    # Make sure we strip all comments and options (e.g "--extra-index-url")
    # that arise from a modified pip.conf file that configure global options
    # when running kedro build-reqs
    requires = []
    for line in f:
        req = line.split("#", 1)[0].strip()
        if req and not req.startswith("--"):
            requires.append(req)

setup(
    name="wine_model_kedro",
    version="0.1",
    packages=find_packages(exclude=["tests"]),
    entry_points={"console_scripts": [entry_point]},
    install_requires=requires,
    extras_require={
        "docs": [
            "docutils<0.18.0",
            "sphinx~=3.4.3",
            "sphinx_rtd_theme==0.5.1",
            "nbsphinx==0.8.1",
            "nbstripout~=0.4",
            "sphinx-autodoc-typehints==1.11.1",
            "sphinx_copybutton==0.3.1",
            "ipykernel>=5.3, <7.0",
            "Jinja2<3.1.0",
            "myst-parser~=0.17.2",
        ]
    },
)

entry_point = (..., "databricks_run = wine_model_kedro.databricks_run:main")

Juan Luis

12/11/2023, 10:38 AM

yeah the last

entry_point = (...)

presumably has no effect, because it happens after the

setup()

function cal. could you try adding that above? like

Copy code

entry_points = (
  "wine-model-kedro = wine_model_kedro.__main__:main",
  "databricks_run = wine_model_kedro.databricks_run:main"
)

and then use the

databricks_run

entry point for your DB job

Juan David Patiño Guerra

12/11/2023, 5:01 PM

Thanks @Juan Luis! Indeed, a simple mistake. As a follow up question, are there plans to make this simpler to run when executing from a Databricks Job? It seems that compare to the other solutions/environments, doing this as a Job requires some extra effort, and loses a bit of the charm that kedro has had when implementing in other parts.

Juan Luis

12/11/2023, 5:35 PM

indeed... it's an ongoing issue https://github.com/kedro-org/kedro/issues/1807 we've reported to Microsoft and Databricks that Click-based entry points don't work, and in the meantime we're trying to come up with a solution, but it's tricky. any ideas are welcome

👍 1

Juan David Patiño Guerra

12/12/2023, 4:36 PM

I am not sure if this question should be apart, but I think you could help here as well @Juan Luis. I am running the Databricks Job, following the instructions mentioned above. However now I hit the part that the Kedro-telemetry asks for permission during CI pipeline, as seen in the attachment. Beacuse there is no user interaction, it remains there indefinitely. The solution seems to be to either uninstall kedro-telemetry via pip or to run

echo "consent: false" > .telemetry

. The thing is that I don't seem to get it right, as it keeps asking for permission even after I create the .telemetry file (at least that seems to be happening). I am running this as part of a databricks asset bundles pipeline (works similar to dbx), so I define the way the pipeline should be in a yaml file. Doing pip uninstall doesn't seem simple to do once the cluster is created. So I look at the creation of the .telemetry file. In the init script (see atachment), I run the code I presented above. Yet it still asks for permissions. Am I missing something or is this a Databricks limitation with Kedro? I can share more info if needed. Thanks in advance (again)!

Juan Luis

12/12/2023, 6:47 PM

The solution seems to be to either uninstall kedro-telemetry via pip or to run
echo "consent: false" > .telemetry
.

in principle that should be it

The thing is that I don't seem to get it right, as it keeps asking for permission even after I create the .telemetry file

probably a matter of where the

.telemetry

file is placed... but again, this is much more complicated than it should already. at this point it's better that you remove

kedro-telemetry

from your dependencies so it doesn't get installed in the cluster. I left a comment in the appropriate issue, we'll likely tackle this at the beginning of next year

Juan David Patiño Guerra

12/12/2023, 7:34 PM

Indeed. The thing of uninstalling libraries in the cluster is that it’s not that smooth (not sure if even possible) when working with dbx, or in general the Databricks API. Do you know if there is a way to do pip install of Kedro without the kedro-telemetry that comes with it by default?

Juan Luis

12/12/2023, 8:34 PM

Kedro does not depend on Kedro-telemetry, it must be somewhere in your requirements.txt you can go and remove it

Juan David Patiño Guerra

12/13/2023, 9:26 AM

Thanks! Indeed, I had to package it again without it. Now I face the following, which is more cryptic (for me) - see attachment. I don't really know why it goes there to look for the configuration path. In theory it should look at the locations that I am giving in the config (see other attachment). Do you know how to proceed here? Thanks for the help so far with this @Juan Luis.

Juan Luis

12/13/2023, 9:37 AM

for visibility, create a new thread @Juan David Patiño Guerra, otherwise only me will see this 😅

Juan Luis

12/13/2023, 9:37 AM

glad we're making progress, sorry it's being such a painful experience

Juan David Patiño Guerra

12/13/2023, 9:43 AM

Sure, will create a new one.

6 Views

Open in Slack

Previous Next