SOLVED Hi all I have a simple kedro pipeline which reads in Kedro #questions

[SOLVED] Hi all, I have a simple kedro pipeline wh...

Xinghong Fang

02/27/2023, 3:14 AM

[SOLVED] Hi all, I have a simple kedro pipeline which reads input file from S3 and update a postgres database. We want to use AWS Lambda (containerized) to run this since it is the simplest and cheapest way. However, we are hit with the

_multiprocessing.SemLock is not implemented

issue when launching the pipeline. A quick google search bring me to this issue https://stackoverflow.com/questions/34005930/multiprocessing-semlock-is-not-implemented-when-running-on-aws-lambda Looks like AWS Lambda's python runtime is missing

/dev/shm

, which seems to be needed by the

KedroSession

Has anyone successfully ran a kedro pipeline on AWS Lambda? Thanks in advance!

Nok Lam Chan

02/27/2023, 7:49 AM

Hi @Xinghong Fang Is there any chance you are using Kedro with an older version (<0.18.4)? Recently we fixed a bug that might relate to this. https://github.com/kedro-org/kedro/releases#:~:text=Refactored%20ShelveStore%20to%20its%20own%20module%20to%20ensure%20multiprocessing%20works%20with%20it.

Xinghong Fang

02/27/2023, 8:35 AM

Hi @Nok Lam Chan, we are on

0.18.4

, I can only get around the issue if I patch the

multiprocessing.Lock

function (not sure if it is safe to do)

Xinghong Fang

02/27/2023, 8:36 AM

I think our case is slightly different from the spaceflight+step_function one, which converts the kedro pipeline to step functions and only run a single node in each lambda function. We are trying to run a full kedro pipeline (with only 3-4 nodes) inside a single lambda function.

Xinghong Fang

02/27/2023, 8:42 AM

currently we are using the code snippet below as our main entrypoint for lambda

Copy code

from pathlib import Path
from unittest.mock import patch

def handle_event(event, context):
    print("================ TEST PATCH ===============")
    with patch("multiprocessing.Lock"):
        from kedro.framework.cli.cli import KedroCLI
        cli_collection = KedroCLI(project_path=Path.cwd())
        cli_collection(args=["run"])

we are using the default sequential runner, but I still feel skeptical about patching out the lock.

Nok Lam Chan

02/27/2023, 8:45 AM

Do you have the full traceback? I am curious where is the SemLock error coming from

Nok Lam Chan

02/27/2023, 8:47 AM

kedro

shouldn’t have any dependency on

multiprocessing

unless

ShelveStore

is specified in

settings.py

Xinghong Fang

02/27/2023, 9:29 AM

You're right! My bad, all good here now after a docker image rebuild. I think we might be using an older version of the image.

Copy code

from pathlib import Path

def handle_event(event, context):
    from kedro.framework.cli.cli import KedroCLI
    cli_collection = KedroCLI(project_path=Path.cwd())
    cli_collection(args=["run"])

This code now runs without error. Thanks!

Nok Lam Chan

02/27/2023, 9:30 AM

Awesome, glad to hear it works now. 🙂

K 1

20 Views

Open in Slack

Previous Next