Hey guys - I am currently trying to save/load pysp...
# questions
p
Hey guys - I am currently trying to save/load pyspark ml objects through the catalog. The documentation states the following: https://kedro.readthedocs.io/en/stable/tools_integration/pyspark.html#use-memorydataset-with-copy-mode-assign-for-non-dataframe-spark-objects and the recommendation to use
MemoryDataSets
for those non-dataframe instances. That is all fine and well, though of course not being able to save any transformers becomes quite tedious at some point. Is there any guidance/ development on that front?
d
can you explain more about the type of object you’re looking to serialise?
I think this is the first time we’ve had a user ask for this - this should be a pretty custom simple dataset to implement
and lastly, what happens if you try and use one the PickleDataSet engines? These will be jvm objects not pickleable
p
Exactly, it is not pickeable, since it is a jvm object. Sorry for asking again: What do you mean with building a wrapper around the MLWriter and reader? Do you mean building a kedro dataset that utilizes these classes?
And thanks for answering already
d
exactly
so you just wrap the load and save methods of those classes and it should work - we’d also love a contribution back into the project if you get it working!
p
that sounds great - appreciate the help!
Thanks