Hey, in working with Kedro on a Macbook M1 I came ...
# questions
r
Hey, in working with Kedro on a Macbook M1 I came across an issue in https://docs.kedro.org/en/stable/_modules/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.html#TensorFlowModelDataset Inside
_load
, it tries to load from a temporary directory that was created during
_save
. However, at least on my computer, that temporary directory doesn't exist. If I patch the implementation of
TensorflowModelDataset._load
such that it loads from
_fs
instead of the temporary path, everything works fine since
_save
copies over the model to
_fs
. Just putting this here to ask a few questions: 1. Should I create a PR that updates
TensorflowModelDataset._load
to not use the temporary directory? 2. Can somebody with a non-M1 machine try and reproduce the error to see if it's an issue with this particular hardware? I'm happy to help with that as well 3. If it is hardware specific, then should the PR fix create a new dataset that's like
TensorflowModelDatasetM1
for this special case? Note that I'm relatively new to kedro, so if this is the wrong place to ask/discuss, let me know. Thanks!
m
Hi Ryan! Thanks for reaching out, it’s great to see you here 😄 I will have a look at reproducing this on my machine, which is not an M1 and let you know what the best way is to resolve this.
The
TensorFlowModelDataSet
is working fine for me. My suggestion would be to not create a completely new dataset but add a check in the
load
and
save
methods for the system type and deal with it differently if it’s
arm
r
Is there a kedro starter I can test that uses
TensorflowModelDataSet
with instead of my own custom case? I also want to make sure that this is 100% due to hardware differences. Since another reason it could work for you and not me at the moment is that I'm using a custom model in tensorflow wrapped with
@tf.keras.utils.register_keras_serializable()
Once I confirm, I can open a PR with the change you suggested where I look for
arm
as the os. And if it does have something to do with the custom modelling, we can check if that also happens on windows
👍 1
m
We don’t have a starter that uses that dataset unfortunately. I tested it with a super simple bit of code in a notebook:
Copy code
from kedro_datasets.tensorflow import TensorFlowModelDataSet
import tensorflow as tf
import numpy as np

data_set = TensorFlowModelDataSet("data/06_models/tensorflow_model.h5")

inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

input_data = np.array([[0.5, 0.3, 0.2]])
predictions = model.predict(input_data)
data_set.save(model)

loaded_model = data_set.load()
input_data2 = np.array([[0.8, 0.4, 0.8]])
new_predictions = loaded_model.predict(input_data2)
np.testing.assert_allclose(predictions, new_predictions, rtol=1e-6, atol=1e-6)
r
Thank you. If I try and run the exact same code, I'll get the following error:
Copy code
OSError: SavedModel file does not exist at: /var/folders/bc/33y59_xn6yd3smds_z3wsl9r0000gp/T/kedro_tensorflow_tmpzqej8lam/{saved_model.pbtxt|saved_model.pb}
So I'm assuming this is a hardware issue. One quick question, why do you have
input_data != input_data2
. If you're testing the loaded model, shouldn't the input data be the same? Anyways, I've confirmed that updating the object to not run with temporary files
if platform.processor() == "arm":
works. Is there a particular way that y'all do PRs? Or can I just clone, create a branch, and open a PR?
Also @Merel what is your python and tensorflow version? I'm on python 3.10 and tf 2.13
m
Tensorflow version is
tensorflow==2.12.0
and python 3.9
w
@Ryan Saxe can you do an os.listdir() inside the temp directory? I’m seeing a similar error in Linux. In my case fsspec is creating a directory inside the temp folder and copying the model files inside, causing the load operation to fail.
r
@William Caicedo when you say "inside the temp directory" do you just mean
os.listdir(path)
inside the with statement in
TensorFlowModelDataset._save
? Prior to calling
tf.keras.models.save_model
it returns an empty list. Afterwords it returns
['fingerprint.pb', 'keras_metadata.pb', 'variables', 'saved_model.pb', 'assets']
w
@Ryan Saxe I meant inside
_load
. The temporary path is needed because it is possible that the model is being stored remotely (e.g. S3).
r
@William Caicedo I cannot execute
os.listdir(path)
instead
_load
because the statement
with tempfile.TemporaryDirectory(prefix=self._tmp_prefix) as path:
generates the error, hence I never have access to
path
@William Caicedo right before the
with
statement I can execute
os.listdir(load_path)
, which will point to the path specified in the data catalog. This works properly and returns
['fingerprint.pb', 'keras_metadata.pb', 'variables', 'saved_model.pb', 'assets']
because
_save
copies over the files from the temp directory into the directory specified in the data catalog.
w
Is the problem then with
tempfile.TemporaryDirectory
?
r
@William Caicedo my apologies, the error was not coming from that line, I made a mistake when looking at my logs.
os.listdir(path)
yields `['tensorflow_model.h5']`inside the
_load
w
I’m a bit confused because sometimes you refer to the
.h5
format which is a single file and others to the
SavedModel
format, which is a directory with multiple files.
r
@William Caicedo I'm assuming that's a large part of this bug. when I do
os.listdir(path)
inside
_save
, I see
SavedModel
format. But when I do
os.listdir(path)
inside
_load
, I see the
.h5
format
w
Try setting
save_format
to either
h5
or
tf
under both
load_args
and
save_args
to see what happens
r
@William Caicedo there is no
save_format
in the kwargs for
tf.keras.models.load_model
so I cannot specify that. Shifting
save_format
to `h5`in
save_args
yields a very similar error:
Copy code
DatasetError: Failed while loading data from data set TensorFlowModelDataset(filepath=data/06_models/tensorflow_model.h5, protocol={}, save_args={'save_format': h5}).
No file or directory found at /var/folders/bc/33y59_xn6yd3smds_z3wsl9r0000gp/T/kedro_tensorflow_tmpy73clzrf/tmp_tensorflow_model.h5
@William Caicedo okay, now I'm confident at what is happening, as odd as this is. in
_load
, the
path
points to the temporary directory. That directory contains the folder that contains the model. So when I tell keras to load from that
path
, it cant. But if I ask it to load from
path/tensorflow_model.h5
, this actually works.
w
Yep, that looks like what I’ve experienced recently. Maybe something changed with
fsspec
works
r
Checking in on this again . . . what are people's
fsspec
version? I'm on
Copy code
>>> fsspec.__version__
'2023.9.0'
Either this is the wrong version and the reqs/toml should be more specific, or there's something to be patched.
j
interesting. I failed to reproduce this issue but I was on an older version,
fsspec==2023.1.0
. will give this another try
okay I can reproduce. opening an issue now