Afaque Ahmad
01/19/2023, 9:10 AMLivyRunner
to be able to submit jobs to an EMR
cluster using Livy
. I'm using Kedro
0.18.4
. I need to pass the code as a string to Livy
. Has anyone created something similar. Any help is really appreciated.
I'm trying to pass the code in _run
to Livy
. How to figure our which pipeline and node to run? We do have the following parameters in the _run
function but it cannot be passed to the string.
def _run(
self,
pipeline: Pipeline,
catalog: DataCatalog,
hook_manager: PluginManager,
session_id: str = None,
) -> None:
datajoely
01/19/2023, 9:14 AMafter_context_created
or before_pipeline_run
hook here?Afaque Ahmad
01/19/2023, 9:20 AM_run
as a string to the Livy.
Something as below (not complete)
cmd = textwrap.dedent("""
import json
import sys
import time
from collections import Counter
from itertools import chain
import requests
from pluggy import PluginManager
from <http://kedro.io|kedro.io> import AbstractDataSet, DataCatalog, MemoryDataSet
from kedro.pipeline import Pipeline
from kedro.runner.runner import AbstractRunner, run_node
run_node(node, catalog, hook_manager, self._is_async, session_id)
data = {
"code": cmd,
"kind": "pyspark"
}
statements_url = session_url + '/statements'
r = <http://requests.post|requests.post>(statements_url, data=json.dumps(data), headers=headers)
""")
load_context
and the set of pipelines and nodes running? (the ones passed to the _run
function?