Armen Paronikyan
03/27/2023, 3:24 PMkedro-mlflow
I am trying to implement a distributed architecture where the artifact will be loaded to S3 and metrics logged to DB. I have a problem with a Pytorch weights file. It is not being uploaded to S3, but during the run it tries to access it and I get an error. I guess this is because it wants to access the file before it is uploaded. It the file is loaded to local directory when I change the mlflow server config.Yolan Honoré-Rougé
03/28/2023, 8:33 PMMlflowModelLoggerDataset
to load the model? Can you put the stack trace?Armen Paronikyan
03/29/2023, 9:38 AMRetrying (Retry(total=4, connect=5, read=4, redirect=5, status=5)) after connection broken by 'ProtocolError('Connection aborted.', connectionpool.py:812
RemoteDisconnected('Remote end closed connection without response'))':
/api/2.0/mlflow-artifacts/artifacts/2/fcb684db5bf043b8bcb08a112de0c47f/artifacts/model/data/model.pth
I am using PickleDataset to save the modelYolan Honoré-Rougé
03/29/2023, 7:47 PMcatalog
look like for this entry? Do you use pipeline_ml_factory
function? Do you have a custom hook or just kedro-mlflow
installed? How do you run kedro? Through a notebook or the kedro run
command? How is configured your mlflow.yml
and especially your tracking server?
It seems that you do not have the right log into mlflow. What happens if you use mlflow.log_artifact(model_path)
in a notebook? Do you have the same error?Armen Paronikyan
03/30/2023, 11:27 AMmlflow server
runs under the hood. They have a repsone timeout and since I uploaded large weight files to S3 the timeout was exceeded and the workers were killed by gunicorn. I fixed it with turning off the timeout.