Hi Kedro Folks. I'm trying to create a `LivyRunner...
# questions
a
Hi Kedro Folks. I'm trying to create a
LivyRunner
to be able to submit jobs to an
EMR
cluster using
Livy
. I'm using
Kedro
0.18.4
. I need to pass the code as a string to
Livy
. Has anyone created something similar. Any help is really appreciated. I'm trying to pass the code in
_run
to
Livy
. How to figure our which pipeline and node to run? We do have the following parameters in the
_run
function but it cannot be passed to the string.
Copy code
def _run(
        self,
        pipeline: Pipeline,
        catalog: DataCatalog,
        hook_manager: PluginManager,
        session_id: str = None,
    ) -> None:
d
can you do anything with a
after_context_created
or
before_pipeline_run
hook here?
a
I don't think this can be done in the hooks. I'm trying to do something like a load_context and pass the code in
_run
as a string to the Livy. Something as below (not complete)
Copy code
cmd = textwrap.dedent("""
    import json
    import sys
    import time
    from collections import Counter
    from itertools import chain
    
    import requests
    from pluggy import PluginManager
    
    from <http://kedro.io|kedro.io> import AbstractDataSet, DataCatalog, MemoryDataSet
    from kedro.pipeline import Pipeline
    from kedro.runner.runner import AbstractRunner, run_node
    
    run_node(node, catalog, hook_manager, self._is_async, session_id)

data = {
                    "code": cmd,
                    "kind": "pyspark"
                }

                statements_url = session_url + '/statements'
                r = <http://requests.post|requests.post>(statements_url, data=json.dumps(data), headers=headers)
""")
@datajoely How to get the
load_context
and the set of pipelines and nodes running? (the ones passed to the
_run
function?
How to get the current session and conext? I've been using kedro==0.16 earlier. 0.18 seems a little different.