https://kedro.org/ logo
#questions
Title
# questions
w

WEN XIN (Jessie 文馨)

02/03/2023, 4:47 AM
Hi team, is there any guide on submitting
spark
job to
EMR
through
livy
for a
kedro
project?
d

datajoely

02/03/2023, 9:54 AM
We don’t have great docs on this to be honest - the simplest way is run
kedro package
and deploy it that way
w

WEN XIN (Jessie 文馨)

02/07/2023, 11:26 AM
@datajoely thanks a lot! Something like this? • run kedro package to create the egg file and deploy to S3 location • in airflow livyoperator, pass location of the file to py_files parameter • create a python file to import this packaged kedro project and pass sys.argv to main. deploy this file to S3 location
Copy code
from test_livy.__main__ import main
import sys

if __name__ == "__main__":
    main(sys.argv)
• in airflow livy operator, pass location of the file to file parameter • kedro commands are passed as "args" parameter in airflow livy operator
Copy code
# airflow task

t1 = LivyOperator(
    task_id="run_kedro_pipeline",
    driver_memory="1g",
    num_executors=1,
    executor_memory="1g",
    executor_cores=1,
    polling_interval=30,
    file="s3://{{ var.json.AWS_BUCKETS.app.name}}/applications/spark/emr/test_kedro_livy.py",
    args=["--pipeline", "test", "--params", "pipeline:test,app_name:test,ds:{{ ds }}"],
    dag=dag,
    livy_conn_id="livy_emr",
)
d

datajoely

02/07/2023, 11:27 AM
did it work?
I think that looks sensible
w

WEN XIN (Jessie 文馨)

02/07/2023, 11:28 AM
trying to test it, I don't always have an environment to work on 👀
d

datajoely

02/07/2023, 11:28 AM
okay it looks sensible
if you do find a solution we’d love to write some docs on this ❤️
(or accept a contribution 😛 )
let us know how it goes and we’ll do our best to support
w

WEN XIN (Jessie 文馨)

02/07/2023, 12:34 PM
on another related question, is it true that kedro will only work with spark deployment mode "client"? https://github.com/kedro-org/kedro/issues/529
d

datajoely

02/07/2023, 12:57 PM
I’m not entirely sure, but that issue is quite old so Kedro itself will have changed since then
w

WEN XIN (Jessie 文馨)

02/08/2023, 3:59 AM
okie okie. so we tried to run with --deployment-mode cluster, it gives spark error exitCode 13. change to 'client' seems by pass that issue for now. @datajoely do you have any good suggestion on deploy the "conf" files to EMR for the packaged kedro project? we thought about bootstrap action but that will restart the cluster every time?
d

datajoely

02/08/2023, 10:13 AM
this is something that will improve in the next version of kedro
but for now you would have to write your own procedure to put this folder in the right place
w

WEN XIN (Jessie 文馨)

02/08/2023, 10:13 AM
that's awesome! looking forward to that update
we tested that the steps above is working after the conf files are also deployed on to the cluster
6 Views