Hey guys - I am currently trying to save/load pysp...
# questions
Hey guys - I am currently trying to save/load pyspark ml objects through the catalog. The documentation states the following: https://kedro.readthedocs.io/en/stable/tools_integration/pyspark.html#use-memorydataset-with-copy-mode-assign-for-non-dataframe-spark-objects and the recommendation to use
for those non-dataframe instances. That is all fine and well, though of course not being able to save any transformers becomes quite tedious at some point. Is there any guidance/ development on that front?
can you explain more about the type of object you’re looking to serialise?
I think this is the first time we’ve had a user ask for this - this should be a pretty custom simple dataset to implement
and lastly, what happens if you try and use one the PickleDataSet engines? These will be jvm objects not pickleable
Exactly, it is not pickeable, since it is a jvm object. Sorry for asking again: What do you mean with building a wrapper around the MLWriter and reader? Do you mean building a kedro dataset that utilizes these classes?
And thanks for answering already
so you just wrap the load and save methods of those classes and it should work - we’d also love a contribution back into the project if you get it working!
that sounds great - appreciate the help!