Hello ! I use kedro with sagemaker following this...
# questions
m
Hello ! I use kedro with sagemaker following this kedro-tutorial And I have a question: is it possible to use functions created in nodes inside the
sagemaker_entry_point.py
script, example:
Copy code
...
from pipelines.ml_model.model import train_model

...

def main():
    ....
    regressor = train_model(...)
    ...

if __name__ == "__main__":
    # SageMaker will run this script as the main program
    main()
Because I have this error:
ModuleNotFoundError: No module named 'pipelines'
Thanks for your help 🙂
d
I’m not actually sure on this - since that was written the wonderful folks at getindata have written a plugin which may be helpful https://kedro-sagemaker.readthedocs.io/en/0.1.1/source/03_quickstart.html
😎 2
m
Thanks for sharing. Hello @marrrcin, I saw on github you are contributor of this pluging. I have a question about that: is it possible to use another docker images registry instead of ECR ? Thx
m
@Massinissa Saïdi sure, you can use any docker registry as long as SageMaker will be able to access it
K 1
m
Ok nice ! The plugin is very useful and avoid to config a lot of stuff thanks 🙂
m
I’m happy to heart that, thanks! Please let us know how it will work for you in a long term
👍 1
m
Hey @marrrcin I wanted to reproduce your tutorial but with dockerhub public registry, this my error:
ClientError: Failed to invoke sagemaker:CreateProcessingJob. Error Details: Invalid image URI massisaidi/testimage. Please provide a valid Amazon Elastic Container Registry path of the Docker image to run.
This is my
sagemaker.yml
Copy code
docker:
  image: "massisaidi/testimage"
  working_directory: /home/kedro
The image was push correctly on
dockerhub
but sagemaker seems to need ECR registry no ?
m
Yeah, sorry for my mistake - it seems like it’s not possible (even though our plugin will allow you to use any image) 😞 https://stackoverflow.com/a/60330160/1955346
m
Does the plugin use sagemaker estimator class? Looking at the doc:
Copy code
image_uri (str or PipelineVariable): If specified, the estimator will use this image for training and hosting, instead of the appropriate SageMaker official image based on framework_version and py_version. It can be an ECR url or dockerhub image and tag.

Examples:
<http://123.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:1.0custom-image:latest|123.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:1.0custom-image:latest>.
It seems we can use dockerhub image no ?
m
No, we’re using ProcessingStep / ModelStep
m
ok nice, for now there is no option to use
kedro sagemaker run --tag
or
--env
?
m
To switch kedro environment just use
kedro sagemaker -e <name of the env> run
m
thanks !
and nothing for
--tag
?
m
No
m
ok thank you for your help !
is there an easy way to add the
--tag
on my side?
hey again ! is it possible to use docker volume with this plugin for credentials for example ?
m
Right now - no, we’re open to accept any contribution on that part (if it’s possible with the SageMaker Pipelines SDK 🤔 )
m
I'll take a closer look at this 👍
another question sorry 😅 is it possible to dont write tempory file in s3 bucket: kedro-sagemaker-tmp ? I saw anything about that in doc
There is a weird behavior with sagemaker. This is pipeline:
Copy code
pipeline([            node(...,name="split_data_node"),
node(...,name="train_model_node"),
])
When I run in local
kedro run --node=split_data_node
everything works. But when I run `kedro sagemaker run --pipeline=MyPipeline`I have this error on sagemaker logs
ValueError: Pipeline does not contain nodes named ['split_data_node']
. Does someone know why ? 🙏
m
Does this code show the
MyPipeline
or the default one?
Have you rebuild your and pushed your image?
Try
kedro sagemaker run --pipeline=MyPipeline --auto-build -y
Every time you make some change you need to build&push docker image with those changes
👍 1