[SOLVED] Hi all, I have a simple kedro pipeline wh...
# questions
x
[SOLVED] Hi all, I have a simple kedro pipeline which reads input file from S3 and update a postgres database. We want to use AWS Lambda (containerized) to run this since it is the simplest and cheapest way. However, we are hit with the
_multiprocessing.SemLock is not implemented
issue when launching the pipeline. A quick google search bring me to this issue https://stackoverflow.com/questions/34005930/multiprocessing-semlock-is-not-implemented-when-running-on-aws-lambda Looks like AWS Lambda's python runtime is missing
/dev/shm
, which seems to be needed by the
KedroSession
Has anyone successfully ran a kedro pipeline on AWS Lambda? Thanks in advance!
n
Hi @Xinghong Fang Is there any chance you are using Kedro with an older version (<0.18.4)? Recently we fixed a bug that might relate to this. https://github.com/kedro-org/kedro/releases#:~:text=Refactored%20ShelveStore%20to%20its%20own%20module%20to%20ensure%20multiprocessing%20works%20with%20it.
x
Hi @Nok Lam Chan, we are on
0.18.4
, I can only get around the issue if I patch the
multiprocessing.Lock
function (not sure if it is safe to do)
I think our case is slightly different from the spaceflight+step_function one, which converts the kedro pipeline to step functions and only run a single node in each lambda function. We are trying to run a full kedro pipeline (with only 3-4 nodes) inside a single lambda function.
currently we are using the code snippet below as our main entrypoint for lambda
Copy code
from pathlib import Path
from unittest.mock import patch

def handle_event(event, context):
    print("================ TEST PATCH ===============")
    with patch("multiprocessing.Lock"):
        from kedro.framework.cli.cli import KedroCLI
        cli_collection = KedroCLI(project_path=Path.cwd())
        cli_collection(args=["run"])
we are using the default sequential runner, but I still feel skeptical about patching out the lock.
n
Do you have the full traceback? I am curious where is the SemLock error coming from
kedro
shouldn’t have any dependency on
multiprocessing
unless
ShelveStore
is specified in
settings.py
x
You're right! My bad, all good here now after a docker image rebuild. I think we might be using an older version of the image.
Copy code
from pathlib import Path

def handle_event(event, context):
    from kedro.framework.cli.cli import KedroCLI
    cli_collection = KedroCLI(project_path=Path.cwd())
    cli_collection(args=["run"])
This code now runs without error. Thanks!
n
Awesome, glad to hear it works now. 🙂
K 1