Hi everyone Could a dataset be an input to anothe...
# questions
m
Hi everyone Could a dataset be an input to another dataset ? Yeah… I know… Strange question. 😅 Let me clarify: I need to load and “manipulate” the pickle of a model “tagged” as ‘Production’ by MLFlow. (i.e this is therefore not for serving, in which case I could have used mlflow serving cli) In order to so, I first need to query MLFlow’s tracking db to get the source / path of that model. So, to rephrase my initial question Could a pandas.SQLQueryDataSet (the model source / path) become an input to a pickle.PickleDataSet (the mode itself) Or am I framing / approaching the problem from a wrong angle ? Many thanks in advance Regards M
m
Hah, nice one
Lazy datasets ftw - I’ve did a PoC of that once 😄 Effectively you can implement sth like a “lazy dataset”. You can have a custom DataSet which in it’s
load
will return a function that will then load the dataset based on the params you pass to it:
Copy code
class LazyDataSet(AbstractDataSet):
# constructors and other stuff
def _load():
    def lazy_loader(path):
         return PickleDataSet(path).load()
    return lazy_loader
And then you do this in 2 nodes: 1.
node(inputs="from_sql_query", func=<extract the path you need>, outputs="path_you_need")
2.
node(inputs=["path_you_need", "lazy_dataset"], lambda path, lazy: lazy(path))
Sorry for many shortcuts, but I wanted to keep it brief. You get the idea 😉
m
Thx a lot @marrrcin 😃 🙏🏼 I will give it a shot today & will let you know how it goes 😉
🎉 1
Ah… It works like a charm ! What an elegant / brilliant solution. 🤩 Thx a lot @marrrcin (though I must confess that I’m a bit jealous that I can’t, for now, come up with such solutions by myself 😜 ) Thx again 🙏🏼
🥳 1
m
😎 🙂