Hi everyone! I'm having trouble using `tensorflow....
# questions
j
Hi everyone! I'm having trouble using
tensorflow.TensorFlowModelDataset
with an S3 bucket. The model saves fine locally, but when I configure it to save/load directly from S3, it doesn't work. Some key points: • Credentials are fine – I can load other datasets (preprocessing outputs and split data) from S3 without issues. • Uploading manually works – If I explicitly upload the model file using
boto3
or another script, I can access it in S3 just fine. • Had issues with
.h5
models
– Initially, I could retrieve
.h5
files from S3 but loading was not working properly, so I switched to the
.keras
format, which works fine when handling files manually. Has anyone successfully used
tensorflow.TensorFlowModelDataset
with S3? Is there a recommended workaround or configuration to get it working? Any insights would be much appreciated! To make it clearer: I am only having problems when the node output - the model - is pointed to s3. I am getting access denied even after checking credentials , IAM policies, and testing with manual script
👀 1
h
Someone will reply to you shortly. In the meantime, this might help:
e
Hi @João Dias, can you please clarify whether this is happening only on save and whether you can load the model successfully?
I am only having problems when the node output - the model - is pointed to s3.
j
@Elena Khaustova I will get back to you asap, because I have faced other issues and now lost track of myself. For now, the problem was in both saving and loading from s3. I am beginning to suspect it is my keras version
👍 1
e
When saving/loading
TensorFlowModelDataset
first writes the model to a local temporary directory. ``````
Copy code
def save(self, data: tf.keras.Model) -> None:
        save_path = get_filepath_str(self._get_save_path(), self._protocol)

        with tempfile.TemporaryDirectory(prefix=self._tmp_prefix) as tempdir:
            if self._is_h5:
                path = str(PurePath(tempdir) / TEMPORARY_H5_FILE)  # noqa: PLW2901
            else:
                # We assume .keras
                path = str(PurePath(tempdir) / TEMPORARY_KERAS_FILE)  # noqa: PLW2901

            tf.keras.models.save_model(data, path, **self._save_args)

            # Use fsspec to take from local tempfile directory/file and
            # put in ArbitraryFileSystem
            self._fs.put(path, save_path)
Can you doublecheck that
save_path
is what you expect it to be and that local copy of model is created on save?
j
I'm back with more info. I'm using Kedro to save a TensorFlow model as
.keras
directly to S3, but I'm getting an "Access Denied" error. • Saving both
.h5
and
.keras
locally as node outputs works fine, and the
file
command confirms they are correctly saved. • I was able to manually upload both files to S3, download them, and load them successfully in TensorFlow. • I can read and write other datasets to the same S3 bucket without issues, including a
.pkl
scaler in the same directory where I want to store the models. • However, when modifying
catalog.yml
to save
.keras
directly to S3, Kedro throws "Access Denied," even though
.h5
and other datasets uploaded fine through a script and also other datasets were saved in the same bucket during preprocessing
. • IAM permissions should not be the issue since other S3 writes work. I assume this setup is also correct:
Copy code
lstm_model:
  type: tensorflow.TensorFlowModelDataset
  filepath: <s3://my-bucket/data/06_models/lstm_model.keras>
  credentials: dev_s3
e
This looks like a bug and I suspect it relates to temporal directories created (see the message above). I can’t test it with S3 right now but if you try the suggestion above - it might help to understand the reason of the problem. Otherwise, feel free to open an issue so we investigate it.
👍 1
j
Thank you for the support! I will open an issue and I am happy to contribute as much as I can to resolve this
❤️ 1