Armen Paronikyan03/27/2023, 3:24 PM
I am trying to implement a distributed architecture where the artifact will be loaded to S3 and metrics logged to DB. I have a problem with a Pytorch weights file. It is not being uploaded to S3, but during the run it tries to access it and I get an error. I guess this is because it wants to access the file before it is uploaded. It the file is loaded to local directory when I change the mlflow server config.
Yolan Honoré-Rougé03/28/2023, 8:33 PM
to load the model? Can you put the stack trace?
Armen Paronikyan03/29/2023, 9:38 AM
Retrying (Retry(total=4, connect=5, read=4, redirect=5, status=5)) after connection broken by 'ProtocolError('Connection aborted.', connectionpool.py:812
RemoteDisconnected('Remote end closed connection without response'))':
I am using PickleDataset to save the model
Yolan Honoré-Rougé03/29/2023, 7:47 PM
look like for this entry? Do you use
function? Do you have a custom hook or just
installed? How do you run kedro? Through a notebook or the
command? How is configured your
and especially your tracking server? It seems that you do not have the right log into mlflow. What happens if you use
in a notebook? Do you have the same error?
Armen Paronikyan03/30/2023, 11:27 AM
runs under the hood. They have a repsone timeout and since I uploaded large weight files to S3 the timeout was exceeded and the workers were killed by gunicorn. I fixed it with turning off the timeout.