Abhishek Bhatia
10/14/2024, 9:49 AM<https://my.web.app/api/v1/some-task>
a. Body includes parameters to trigger 1 or multiple kedro pipelines as a Vertex AI DAG
My VertexAI DAG has a combination of nodes (steps), and each node:
1. May or may not be a kedro pipeline
2. May be a pyspark workload running on dataproc or non spark workload running on a single compute VM
3. May run a bigquery job
4. May or may not run in a docker container
Let's take the example of submitting a kedro pipeline on Dataproc serverless running on a custom docker container using VertexAI SDK.
Questions:
1. Do you package the kedro code as part of the Docker container or just the dependencies?
For example, i have seen this done alot which packages the kedro code as well:
RUN mkdir /usr/kedro
WORKDIR /usr/kedro/
COPY . .
which means copying the whole project, and then in the src/entrypoint.py
,
from kedro.framework import cli
import os
os.chdir("/usr/kedro")
cli.main()
2. Do I need to package my kedro project as a wheel file and submit it with the job to Dataproc? If so, how have you seen that done with DataprocPySparkBatchOp?
3. How do you recommend to pass dynamic parameters to the kedro pipeline run?
As I understand cli.main()
picks up sys.argv to infer pipeline name and parameters so one could that
kedro run --pipeline <my_pipeline> --params=param_key1=value1,param_key2=2.0
Is there a better recommended way of doing this?
Thanks alot and hoping for a good discussion! 🙂Ravi Kumar Pilla
10/14/2024, 1:59 PMCOPY ..
should work fine
2. The cleaner approach, if you want to minimize the size of docker image and separate code from infra, .whl
should be your option
3. I am not aware of DataprocPySparkBatchOp but based on my search, you can package your kedro project as .whl file and submit it to Dataproc
4. You can pass dynamic params as you mentioned via CLI which works well, you can also pass via VertextAI SDK using main.
from my_project.__main__ import main
main([
"--pipeline", "<my_pipeline>",
"--params", "param_key1=value1,param_key2=2.0"
])
I would also wait for the community to respond if someone has tried this and have any recommendations
Thank youAbhishek Bhatia
10/14/2024, 2:05 PMconf
folder?
I guess the kedro project folder structure has to be brought in somehow so as to execute the project (cloning it or packaging it with the docker image)
And thanks for the 4th point 🙂Abhishek Bhatia
10/14/2024, 2:06 PM