Another `kedro-sagemaker` question: I manage to ge...
# plugins-integrations
w
Another
kedro-sagemaker
question: I manage to get the pipeline showing in Processing jobs but then I get an
Error: No such command 'sagemaker'.
error. I have
kedro-sagemaker
in my
requirements.txt
file and I’m building and pushing the image myself, so I just do a
kedro sagemaker run
. Any ideas what am I doing wrong?
m
You have to be in the folder with Kedro project (
cd
to the folder with
src
and
conf
folders)
w
I think I’m in the right location. I should’ve also said that the error appears in the cloudwatch logs
It’s looking like this
m
If it happens in cloudwatch, then please verify the docker image / dockerfile, especially entrypoint and workdir.
w
Copy code
# Do not change the default entrypoint, it will break the Kedro SageMaker integration!
ENTRYPOINT ["kedro", "sagemaker", "entrypoint"]
working_directory: /home/kedro
I’m running out of ideas here sadcat everything seems correct inside the image,
kedro run
works, but not
kedro sagemaker
m
Have you added
kedro-sagemaker
to the
requirements.txt
?
w
Yes, but the error kept showing. This is my `requirements.txt`:
Copy code
black~=22.0
flake8>=3.7.9, <4.0
ipython>=7.31.1, <8.0
isort~=5.0
jupyter~=1.0
jupyterlab~=3.0
kedro==0.18.8
kedro-mlflow
kedro-datasets[spark.SparkDataSet, tensorflow.TensorFlowModelDataSet, pickle.PickleDataSet]~=1.4.0
kedro-sagemaker
nbstripout~=0.4
pyarrow
pymc-marketing
pyspark==3.3.0
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2
scikit-learn
tensorflow==2.15.0
tensorflow-probability==0.22.1
tensorflow_io
m
Ok. It will be difficult to debug. A few options: • verify whether
pip freeze | grep sagemaker
in the docker image shows the plugin installed • change the workdir in the docker image to sth else - something that doesn’t have
kedro
and
kedro_sagemaker
/
kedro-sagemaker
in the name (make sure to also update the
sagemaker.yml
accordingly)
w
Copy code
root@bc309858e78f:/home/kedro# pip freeze | grep sagemaker
kedro-sagemaker==0.0.1
I think I know what’s going on - there is a dependency clash between
kedro-sagemaker
and
kedro-datasets
with
s3fs
being the problem
If I pin the version of
kedro-sagemaker
to
0.3.0
I get the following error:
Copy code
#9 75.16     kedro-sagemaker 0.3.0 depends on s3fs<2023.0.0 and >=2022.11.0
#9 75.16     kedro-datasets[pickle-pickledataset,spark-sparkdataset,tensorflow-tensorflowmodeldataset] 1.4.0 depends on s3fs<0.5 and >=0.3.0; extra == "spark.sparkdataset"
So if the version is not pinned in the
requirements.txt
file, he only resolution pip can find is to install v0.0.1 of
kedro-sagemaker
which causes my original error
m
Great, that’s the root cause then. Kedro-sagemaker has a requirement of
s3fs = "^2022.11.0"
. @Nok Lam Chan / @datajoely do you know why
kedro-datasets[spark-sparkdataset]
has a strict limit for old version of
s3fs
? Range
>=0.3.0,<0.5
is from 2019 - 2020 😮
@William Caicedo what you could try is to drop
spark-sparkdataset
from
kedro-datasets
extras and install those separately, so in the requriements.txt you would have:
Copy code
kedro-sagemaker~=0.3.0
kedro-datasets[pickle-pickledataset,tensorflow-tensorflowmodeldataset]~=1.4.0
s3fs
hdfs>=2.5.8, <3.0
and see what happens there.
On our side - we will remove version
0.0.1
to avoid future problems like that.
w
@marrrcin Thanks for the help! I forked
kedro-datasets
and bumped the
s3fs
version to
2022.11.0
just to check and the pipeline worked with no issues. I haven’t ran any of the
kedro-datasets
tests yet though. Also, I got a json serialization error when I had some dates as parameters in my
parameters.yml
. The workaround was of course to put quotes around them and treat them as strings. Have you seen that error before?
🥳 1
m
Haven’t seen that one
w
Copy code
/Users/williamc/miniconda3/envs/clv/lib/python3.10/site-packages/kedro_sagemaker/generator.py:98 │
│ in _prepare_sagemaker_params                                                                     │
│                                                                                                  │
│    95 │   │   │   │   sm_param_value = sm_param_types[t](value_name, default_value=v)            │
│    96 │   │   │   else:                                                                          │
│    97 │   │   │   │   sm_param_value = ParameterString(                                          │
│ ❱  98 │   │   │   │   │   value_name, default_value=json.dumps(v)                                │
│    99 │   │   │   │   )                                                                          │
│   100 │   │   │                                                                                  │
│   101 │   │   │   sm_kedro_params.append(sm_param_key)                                           │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/__init__.py:231 in dumps                 │
│                                                                                                  │
│   228 │   │   check_circular and allow_nan and                                                   │
│   229 │   │   cls is None and indent is None and separators is None and                          │
│   230 │   │   default is None and not sort_keys and not kw):                                     │
│ ❱ 231 │   │   return _default_encoder.encode(obj)                                                │
│   232 │   if cls is None:                                                                        │
│   233 │   │   cls = JSONEncoder                                                                  │
│   234 │   return cls(                                                                            │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:199 in encode                 │
│                                                                                                  │
│   196 │   │   # This doesn't pass the iterator directly to ''.join() because the                 │
│   197 │   │   # exceptions aren't as detailed.  The list call should be roughly                  │
│   198 │   │   # equivalent to the PySequence_Fast that ''.join() would do.                       │
│ ❱ 199 │   │   chunks = self.iterencode(o, _one_shot=True)                                        │
│   200 │   │   if not isinstance(chunks, (list, tuple)):                                          │
│   201 │   │   │   chunks = list(chunks)                                                          │
│   202 │   │   return ''.join(chunks)                                                             │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:257 in iterencode             │
│                                                                                                  │
│   254 │   │   │   │   markers, self.default, _encoder, self.indent, floatstr,                    │
│   255 │   │   │   │   self.key_separator, self.item_separator, self.sort_keys,                   │
│   256 │   │   │   │   self.skipkeys, _one_shot)                                                  │
│ ❱ 257 │   │   return _iterencode(o, 0)                                                           │
│   258                                                                                            │
│   259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,                      │
│   260 │   │   _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,                 │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:179 in default                │
│                                                                                                  │
│   176 │   │   │   │   return JSONEncoder.default(self, o)                                        │
│   177 │   │                                                                                      │
│   178 │   │   """                                                                                │
│ ❱ 179 │   │   raise TypeError(f'Object of type {o.__class__.__name__} '                          │
│   180 │   │   │   │   │   │   f'is not JSON serializable')                                       │
│   181 │                                                                                          │
│   182 │   def encode(self, o):                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Object of type date is not JSON serializable
n
I think it’s mostly related to tests, we use
moto
and it has problem with newer version of
s3fs
which prevents us to bump the version.
m
Thx!
@William Caicedo - parameters must be json serializable for the plugin
👍 1