William Caicedo
11/15/2023, 10:32 PMkedro-sagemaker
question: I manage to get the pipeline showing in Processing jobs but then I get an Error: No such command 'sagemaker'.
error. I have kedro-sagemaker
in my requirements.txt
file and I’m building and pushing the image myself, so I just do a kedro sagemaker run
. Any ideas what am I doing wrong?marrrcin
11/15/2023, 11:30 PMcd
to the folder with src
and conf
folders)William Caicedo
11/15/2023, 11:32 PMmarrrcin
11/16/2023, 8:23 AMWilliam Caicedo
11/16/2023, 11:00 PM# Do not change the default entrypoint, it will break the Kedro SageMaker integration!
ENTRYPOINT ["kedro", "sagemaker", "entrypoint"]
working_directory: /home/kedro
kedro run
works, but not kedro sagemaker
marrrcin
11/17/2023, 8:54 AMkedro-sagemaker
to the requirements.txt
?William Caicedo
11/17/2023, 9:01 AMblack~=22.0
flake8>=3.7.9, <4.0
ipython>=7.31.1, <8.0
isort~=5.0
jupyter~=1.0
jupyterlab~=3.0
kedro==0.18.8
kedro-mlflow
kedro-datasets[spark.SparkDataSet, tensorflow.TensorFlowModelDataSet, pickle.PickleDataSet]~=1.4.0
kedro-sagemaker
nbstripout~=0.4
pyarrow
pymc-marketing
pyspark==3.3.0
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2
scikit-learn
tensorflow==2.15.0
tensorflow-probability==0.22.1
tensorflow_io
marrrcin
11/17/2023, 9:39 AMpip freeze | grep sagemaker
in the docker image shows the plugin installed
• change the workdir in the docker image to sth else - something that doesn’t have kedro
and kedro_sagemaker
/ kedro-sagemaker
in the name (make sure to also update the sagemaker.yml
accordingly)William Caicedo
11/19/2023, 8:36 PMroot@bc309858e78f:/home/kedro# pip freeze | grep sagemaker
kedro-sagemaker==0.0.1
kedro-sagemaker
and kedro-datasets
with s3fs
being the problemkedro-sagemaker
to 0.3.0
I get the following error:
#9 75.16 kedro-sagemaker 0.3.0 depends on s3fs<2023.0.0 and >=2022.11.0
#9 75.16 kedro-datasets[pickle-pickledataset,spark-sparkdataset,tensorflow-tensorflowmodeldataset] 1.4.0 depends on s3fs<0.5 and >=0.3.0; extra == "spark.sparkdataset"
So if the version is not pinned in the requirements.txt
file, he only resolution pip can find is to install v0.0.1 of kedro-sagemaker
which causes my original errormarrrcin
11/20/2023, 8:21 AMs3fs = "^2022.11.0"
.
@Nok Lam Chan / @datajoely do you know why kedro-datasets[spark-sparkdataset]
has a strict limit for old version of s3fs
? Range >=0.3.0,<0.5
is from 2019 - 2020 😮spark-sparkdataset
from kedro-datasets
extras and install those separately, so in the requriements.txt you would have:
kedro-sagemaker~=0.3.0
kedro-datasets[pickle-pickledataset,tensorflow-tensorflowmodeldataset]~=1.4.0
s3fs
hdfs>=2.5.8, <3.0
and see what happens there.0.0.1
to avoid future problems like that.William Caicedo
11/20/2023, 8:33 AMkedro-datasets
and bumped the s3fs
version to 2022.11.0
just to check and the pipeline worked with no issues. I haven’t ran any of the kedro-datasets
tests yet though.
Also, I got a json serialization error when I had some dates as parameters in my parameters.yml
. The workaround was of course to put quotes around them and treat them as strings. Have you seen that error before?marrrcin
11/20/2023, 8:36 AMWilliam Caicedo
11/20/2023, 8:43 AM/Users/williamc/miniconda3/envs/clv/lib/python3.10/site-packages/kedro_sagemaker/generator.py:98 │
│ in _prepare_sagemaker_params │
│ │
│ 95 │ │ │ │ sm_param_value = sm_param_types[t](value_name, default_value=v) │
│ 96 │ │ │ else: │
│ 97 │ │ │ │ sm_param_value = ParameterString( │
│ ❱ 98 │ │ │ │ │ value_name, default_value=json.dumps(v) │
│ 99 │ │ │ │ ) │
│ 100 │ │ │ │
│ 101 │ │ │ sm_kedro_params.append(sm_param_key) │
│ │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/__init__.py:231 in dumps │
│ │
│ 228 │ │ check_circular and allow_nan and │
│ 229 │ │ cls is None and indent is None and separators is None and │
│ 230 │ │ default is None and not sort_keys and not kw): │
│ ❱ 231 │ │ return _default_encoder.encode(obj) │
│ 232 │ if cls is None: │
│ 233 │ │ cls = JSONEncoder │
│ 234 │ return cls( │
│ │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:199 in encode │
│ │
│ 196 │ │ # This doesn't pass the iterator directly to ''.join() because the │
│ 197 │ │ # exceptions aren't as detailed. The list call should be roughly │
│ 198 │ │ # equivalent to the PySequence_Fast that ''.join() would do. │
│ ❱ 199 │ │ chunks = self.iterencode(o, _one_shot=True) │
│ 200 │ │ if not isinstance(chunks, (list, tuple)): │
│ 201 │ │ │ chunks = list(chunks) │
│ 202 │ │ return ''.join(chunks) │
│ │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:257 in iterencode │
│ │
│ 254 │ │ │ │ markers, self.default, _encoder, self.indent, floatstr, │
│ 255 │ │ │ │ self.key_separator, self.item_separator, self.sort_keys, │
│ 256 │ │ │ │ self.skipkeys, _one_shot) │
│ ❱ 257 │ │ return _iterencode(o, 0) │
│ 258 │
│ 259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr, │
│ 260 │ │ _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot, │
│ │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:179 in default │
│ │
│ 176 │ │ │ │ return JSONEncoder.default(self, o) │
│ 177 │ │ │
│ 178 │ │ """ │
│ ❱ 179 │ │ raise TypeError(f'Object of type {o.__class__.__name__} ' │
│ 180 │ │ │ │ │ │ f'is not JSON serializable') │
│ 181 │ │
│ 182 │ def encode(self, o): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Object of type date is not JSON serializable
Nok Lam Chan
11/20/2023, 8:48 AMmoto
and it has problem with newer version of s3fs
which prevents us to bump the version.marrrcin
11/20/2023, 8:51 AM