Hi all, I am having issues trying to run a Databr...
# questions
j
Hi all, I am having issues trying to run a Databricks job using Kedro. I am trying to follow this instructions from the docs: https://docs.kedro.org/en/0.18.14/deployment/databricks/databricks_deployment_workflow.html However I get the following error (see attachment): And I have the following folder structure (see attachment): The databricks_run.py file is also packaged in the .whl file that is required. I'm not sure if I'm missing something important. I am using kedro==0.18.7. Good to know that I am creating and calling the job using Databricks Assets Bundles (the replacement of dbx). I don't think this is the reason of the issue. Thanks for your help on this in advance! I can provide more details if needed.
j
hello @Juan David Patiño Guerra with that directory structure, assuming your
databricks_run.py
has the contents of the documentation, that import will not work (it would need to be something like
from wine_model_kedro.databricks_run import main
, which hints that maybe your entry point definition is not correct. can you share the relevant part of your
setup.py
and a screenshot of your job definition to make sure that everything is okay?
j
Good one. Checking there I see that the entry_point variable is assigned twice (my bad). However I am not sure if just letting the bottom one would make it work. This is the content of the
setup.py
file:
Copy code
from setuptools import find_packages, setup

entry_point = (
    "wine-model-kedro = wine_model_kedro.__main__:main"
)


# get the dependencies and installs
with open("requirements.txt", encoding="utf-8") as f:
    # Make sure we strip all comments and options (e.g "--extra-index-url")
    # that arise from a modified pip.conf file that configure global options
    # when running kedro build-reqs
    requires = []
    for line in f:
        req = line.split("#", 1)[0].strip()
        if req and not req.startswith("--"):
            requires.append(req)

setup(
    name="wine_model_kedro",
    version="0.1",
    packages=find_packages(exclude=["tests"]),
    entry_points={"console_scripts": [entry_point]},
    install_requires=requires,
    extras_require={
        "docs": [
            "docutils<0.18.0",
            "sphinx~=3.4.3",
            "sphinx_rtd_theme==0.5.1",
            "nbsphinx==0.8.1",
            "nbstripout~=0.4",
            "sphinx-autodoc-typehints==1.11.1",
            "sphinx_copybutton==0.3.1",
            "ipykernel>=5.3, <7.0",
            "Jinja2<3.1.0",
            "myst-parser~=0.17.2",
        ]
    },
)

entry_point = (..., "databricks_run = wine_model_kedro.databricks_run:main")
j
yeah the last
entry_point = (...)
presumably has no effect, because it happens after the
setup()
function cal. could you try adding that above? like
Copy code
entry_points = (
  "wine-model-kedro = wine_model_kedro.__main__:main",
  "databricks_run = wine_model_kedro.databricks_run:main"
)
and then use the
databricks_run
entry point for your DB job
j
Thanks @Juan Luis! Indeed, a simple mistake. As a follow up question, are there plans to make this simpler to run when executing from a Databricks Job? It seems that compare to the other solutions/environments, doing this as a Job requires some extra effort, and loses a bit of the charm that kedro has had when implementing in other parts.
j
indeed... it's an ongoing issue https://github.com/kedro-org/kedro/issues/1807 we've reported to Microsoft and Databricks that Click-based entry points don't work, and in the meantime we're trying to come up with a solution, but it's tricky. any ideas are welcome
👍 1
j
I am not sure if this question should be apart, but I think you could help here as well @Juan Luis. I am running the Databricks Job, following the instructions mentioned above. However now I hit the part that the Kedro-telemetry asks for permission during CI pipeline, as seen in the attachment. Beacuse there is no user interaction, it remains there indefinitely. The solution seems to be to either uninstall kedro-telemetry via pip or to run
echo "consent: false" > .telemetry
. The thing is that I don't seem to get it right, as it keeps asking for permission even after I create the .telemetry file (at least that seems to be happening). I am running this as part of a databricks asset bundles pipeline (works similar to dbx), so I define the way the pipeline should be in a yaml file. Doing pip uninstall doesn't seem simple to do once the cluster is created. So I look at the creation of the .telemetry file. In the init script (see atachment), I run the code I presented above. Yet it still asks for permissions. Am I missing something or is this a Databricks limitation with Kedro? I can share more info if needed. Thanks in advance (again)!
j
The solution seems to be to either uninstall kedro-telemetry via pip or to run
echo "consent: false" > .telemetry
.
in principle that should be it
The thing is that I don't seem to get it right, as it keeps asking for permission even after I create the .telemetry file
probably a matter of where the
.telemetry
file is placed... but again, this is much more complicated than it should already. at this point it's better that you remove
kedro-telemetry
from your dependencies so it doesn't get installed in the cluster. I left a comment in the appropriate issue, we'll likely tackle this at the beginning of next year
j
Indeed. The thing of uninstalling libraries in the cluster is that it’s not that smooth (not sure if even possible) when working with dbx, or in general the Databricks API. Do you know if there is a way to do pip install of Kedro without the kedro-telemetry that comes with it by default?
j
Kedro does not depend on Kedro-telemetry, it must be somewhere in your requirements.txt you can go and remove it
j
Thanks! Indeed, I had to package it again without it. Now I face the following, which is more cryptic (for me) - see attachment. I don't really know why it goes there to look for the configuration path. In theory it should look at the locations that I am giving in the config (see other attachment). Do you know how to proceed here? Thanks for the help so far with this @Juan Luis.
j
for visibility, create a new thread @Juan David Patiño Guerra, otherwise only me will see this 😅
glad we're making progress, sorry it's being such a painful experience
j
Sure, will create a new one.