Hi guys, a question regarding `kedro-mlflow` I am...
# plugins-integrations
a
Hi guys, a question regarding
kedro-mlflow
I am trying to implement a distributed architecture where the artifact will be loaded to S3 and metrics logged to DB. I have a problem with a Pytorch weights file. It is not being uploaded to S3, but during the run it tries to access it and I get an error. I guess this is because it wants to access the file before it is uploaded. It the file is loaded to local directory when I change the mlflow server config.
y
Hi, could you elaborate on the sentence "during the run it tries to access it and I get an error"? How does it try to access it? Do you use a
MlflowModelLoggerDataset
to load the model? Can you put the stack trace?
a
HI thanks for the reply, here is the error
Retrying (Retry(total=4, connect=5, read=4, redirect=5, status=5)) after connection broken by 'ProtocolError('Connection aborted.',     connectionpool.py:812
RemoteDisconnected('Remote end closed connection without response'))':
/api/2.0/mlflow-artifacts/artifacts/2/fcb684db5bf043b8bcb08a112de0c47f/artifacts/model/data/model.pth
I am using PickleDataset to save the model
y
Could you elaborate on your setup? What does your
catalog
look like for this entry? Do you use
pipeline_ml_factory
function? Do you have a custom hook or just
kedro-mlflow
installed? How do you run kedro? Through a notebook or the
kedro run
command? How is configured your
mlflow.yml
and especially your tracking server? It seems that you do not have the right log into mlflow. What happens if you use
mlflow.log_artifact(model_path)
in a notebook? Do you have the same error?
a
@Yolan Honoré-Rougé Thanks for digging in. I figured out the issue. The problem was with the gunicorn workers that
mlflow server
runs under the hood. They have a repsone timeout and since I uploaded large weight files to S3 the timeout was exceeded and the workers were killed by gunicorn. I fixed it with turning off the timeout.
👍 1