William Caicedo
11/15/2023, 10:32 PMkedro-sagemaker question: I manage to get the pipeline showing in Processing jobs but then I get an Error: No such command 'sagemaker'. error. I have kedro-sagemaker in my requirements.txt file and I’m building and pushing the image myself, so I just do a kedro sagemaker run. Any ideas what am I doing wrong?marrrcin
11/15/2023, 11:30 PMcd to the folder with src and conf folders)William Caicedo
11/15/2023, 11:32 PMWilliam Caicedo
11/16/2023, 1:20 AMmarrrcin
11/16/2023, 8:23 AMWilliam Caicedo
11/16/2023, 11:00 PM# Do not change the default entrypoint, it will break the Kedro SageMaker integration!
ENTRYPOINT ["kedro", "sagemaker", "entrypoint"]
working_directory: /home/kedroWilliam Caicedo
11/16/2023, 11:02 PMkedro run works, but not kedro sagemakermarrrcin
11/17/2023, 8:54 AMkedro-sagemaker to the requirements.txt?William Caicedo
11/17/2023, 9:01 AMblack~=22.0
flake8>=3.7.9, <4.0
ipython>=7.31.1, <8.0
isort~=5.0
jupyter~=1.0
jupyterlab~=3.0
kedro==0.18.8
kedro-mlflow
kedro-datasets[spark.SparkDataSet, tensorflow.TensorFlowModelDataSet, pickle.PickleDataSet]~=1.4.0
kedro-sagemaker
nbstripout~=0.4
pyarrow
pymc-marketing
pyspark==3.3.0
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2
scikit-learn
tensorflow==2.15.0
tensorflow-probability==0.22.1
tensorflow_iomarrrcin
11/17/2023, 9:39 AMpip freeze | grep sagemaker in the docker image shows the plugin installed
• change the workdir in the docker image to sth else - something that doesn’t have kedro and kedro_sagemaker / kedro-sagemaker in the name (make sure to also update the sagemaker.yml accordingly)William Caicedo
11/19/2023, 8:36 PMroot@bc309858e78f:/home/kedro# pip freeze | grep sagemaker
kedro-sagemaker==0.0.1William Caicedo
11/19/2023, 8:37 PMkedro-sagemaker and kedro-datasets with s3fs being the problemWilliam Caicedo
11/19/2023, 8:39 PMkedro-sagemaker to 0.3.0 I get the following error:
#9 75.16     kedro-sagemaker 0.3.0 depends on s3fs<2023.0.0 and >=2022.11.0
#9 75.16     kedro-datasets[pickle-pickledataset,spark-sparkdataset,tensorflow-tensorflowmodeldataset] 1.4.0 depends on s3fs<0.5 and >=0.3.0; extra == "spark.sparkdataset"
So if the version is not pinned in the requirements.txt file, he only resolution pip can find is to install v0.0.1 of kedro-sagemaker which causes my original errormarrrcin
11/20/2023, 8:21 AMs3fs = "^2022.11.0" .
@Nok Lam Chan / @datajoely do you know why kedro-datasets[spark-sparkdataset] has a strict limit for old version of s3fs? Range >=0.3.0,<0.5 is from 2019 - 2020 😮marrrcin
11/20/2023, 8:24 AMspark-sparkdataset from kedro-datasets extras and install those separately, so in the requriements.txt you would have:
kedro-sagemaker~=0.3.0
kedro-datasets[pickle-pickledataset,tensorflow-tensorflowmodeldataset]~=1.4.0
s3fs
hdfs>=2.5.8, <3.0
and see what happens there.marrrcin
11/20/2023, 8:27 AM0.0.1 to avoid future problems like that.William Caicedo
11/20/2023, 8:33 AMkedro-datasets and bumped the s3fs version to 2022.11.0 just to check and the pipeline worked with no issues. I haven’t ran any of the kedro-datasets tests yet though.
Also, I got a json serialization error when I had some dates as parameters in my parameters.yml. The workaround was of course to put quotes around them and treat them as strings. Have you seen that error before?marrrcin
11/20/2023, 8:36 AMWilliam Caicedo
11/20/2023, 8:43 AM/Users/williamc/miniconda3/envs/clv/lib/python3.10/site-packages/kedro_sagemaker/generator.py:98 │
│ in _prepare_sagemaker_params                                                                     │
│                                                                                                  │
│    95 │   │   │   │   sm_param_value = sm_param_types[t](value_name, default_value=v)            │
│    96 │   │   │   else:                                                                          │
│    97 │   │   │   │   sm_param_value = ParameterString(                                          │
│ ❱  98 │   │   │   │   │   value_name, default_value=json.dumps(v)                                │
│    99 │   │   │   │   )                                                                          │
│   100 │   │   │                                                                                  │
│   101 │   │   │   sm_kedro_params.append(sm_param_key)                                           │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/__init__.py:231 in dumps                 │
│                                                                                                  │
│   228 │   │   check_circular and allow_nan and                                                   │
│   229 │   │   cls is None and indent is None and separators is None and                          │
│   230 │   │   default is None and not sort_keys and not kw):                                     │
│ ❱ 231 │   │   return _default_encoder.encode(obj)                                                │
│   232 │   if cls is None:                                                                        │
│   233 │   │   cls = JSONEncoder                                                                  │
│   234 │   return cls(                                                                            │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:199 in encode                 │
│                                                                                                  │
│   196 │   │   # This doesn't pass the iterator directly to ''.join() because the                 │
│   197 │   │   # exceptions aren't as detailed.  The list call should be roughly                  │
│   198 │   │   # equivalent to the PySequence_Fast that ''.join() would do.                       │
│ ❱ 199 │   │   chunks = self.iterencode(o, _one_shot=True)                                        │
│   200 │   │   if not isinstance(chunks, (list, tuple)):                                          │
│   201 │   │   │   chunks = list(chunks)                                                          │
│   202 │   │   return ''.join(chunks)                                                             │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:257 in iterencode             │
│                                                                                                  │
│   254 │   │   │   │   markers, self.default, _encoder, self.indent, floatstr,                    │
│   255 │   │   │   │   self.key_separator, self.item_separator, self.sort_keys,                   │
│   256 │   │   │   │   self.skipkeys, _one_shot)                                                  │
│ ❱ 257 │   │   return _iterencode(o, 0)                                                           │
│   258                                                                                            │
│   259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,                      │
│   260 │   │   _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,                 │
│                                                                                                  │
│ /Users/williamc/miniconda3/envs/clv/lib/python3.10/json/encoder.py:179 in default                │
│                                                                                                  │
│   176 │   │   │   │   return JSONEncoder.default(self, o)                                        │
│   177 │   │                                                                                      │
│   178 │   │   """                                                                                │
│ ❱ 179 │   │   raise TypeError(f'Object of type {o.__class__.__name__} '                          │
│   180 │   │   │   │   │   │   f'is not JSON serializable')                                       │
│   181 │                                                                                          │
│   182 │   def encode(self, o):                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Object of type date is not JSON serializableNok Lam Chan
11/20/2023, 8:48 AMNok Lam Chan
11/20/2023, 8:50 AMmoto and it has problem with newer version of s3fs which prevents us to bump the version.marrrcin
11/20/2023, 8:51 AMmarrrcin
11/20/2023, 8:52 AM