Is there a suggested way to set a numpy.random.sta...
# questions
l
Is there a suggested way to set a numpy.random.state globally? I thought about using a global parameter but it might be verbose to add it to all the functions that require it. I was also thinking about a custom Hook that would execute
np.random.set_state()
before the nodes that require it but I would like to hear what your solution to "reproducible randomness" looks like 🙂
n
@Lorenzo Castellino Great question! Why do you need to set the seed at a node level? Normally I think setting it at import level would be fine, I may even not put it in the parameters, since this isn’t something that I would want to change or play around with.
l
No particular reason, as you said is something I would like to set and forget (even tough I think there might be some niche use-cases where different random states might be required). Importing it a import level it's actually a good idea but where? In the
pipeline_registry.py
file, the appropriate
node.py
file or in another import location better suited for it?
n
I haven’t thought about this too much.
node.py
may not be the best since there could be multiple nodes file using numpy.
pipeline_registry.py
is fine, or if you really want to make it explicit then a
before_pipeline_run
hook would do the job.
👍 2
l
I like the
before_pipeline_run
method. Thanks for the discussion and your view on the matter! 🙂 Maybe it might be something that I could add to the "common use cases" section of the docs? There is already a
before_pipeline_run
example but it's quite generic. Maybe a more pragmatic example could be a nice addition.
n
That sounds like a good idea.
Feel free to open a PR
b
I've often set it directly in
settings.py
🤔 1
d
Doing it in
settings.py
may work, but I think
before_pipeline_run
is a lot more idiomatic.
👍 3
l
I did a bit more research on the matter and I came across this bit from the official Sklearn Docs. In it it's suggested to create an rng variable that contains the desired
np.random.RandomState()
that can be passed down to the various classes that accept the
random_state
kwarg. The reasoning behind it is explained in this SO discussion which is also linked in the aforementioned documentation articles. My idea at this point is to actually pickle the rng variable and load it where needed. This feels a bit tedious due to the fact that all the corresponding functions and pipelines needs to be updated accordingly. Is there a more elegant way to do so? I still think that Hooks might be the key but I can't see how...
b
You could initialize it as a global variable in a helper module (e.g.,
rng.py
) and then just import it into each node function if that's the strategy you want to employ(
from rng import RNG
)
👍 1
l
This sounds like a fine idea