Is there a suggested way to set a numpy random state globall Kedro #questions

Is there a suggested way to set a numpy.random.sta...

Lorenzo Castellino

10/31/2022, 8:48 AM

Is there a suggested way to set a numpy.random.state globally? I thought about using a global parameter but it might be verbose to add it to all the functions that require it. I was also thinking about a custom Hook that would execute

np.random.set_state()

before the nodes that require it but I would like to hear what your solution to "reproducible randomness" looks like 🙂

Nok Lam Chan

10/31/2022, 9:08 AM

@Lorenzo Castellino Great question! Why do you need to set the seed at a node level? Normally I think setting it at import level would be fine, I may even not put it in the parameters, since this isn’t something that I would want to change or play around with.

Lorenzo Castellino

10/31/2022, 9:17 AM

No particular reason, as you said is something I would like to set and forget (even tough I think there might be some niche use-cases where different random states might be required). Importing it a import level it's actually a good idea but where? In the

pipeline_registry.py

file, the appropriate

node.py

file or in another import location better suited for it?

Nok Lam Chan

10/31/2022, 9:58 AM

I haven’t thought about this too much.

node.py

may not be the best since there could be multiple nodes file using numpy.

pipeline_registry.py

is fine, or if you really want to make it explicit then a

before_pipeline_run

hook would do the job.

👍 2

Lorenzo Castellino

10/31/2022, 9:59 AM

I like the

before_pipeline_run

method. Thanks for the discussion and your view on the matter! 🙂 Maybe it might be something that I could add to the "common use cases" section of the docs? There is already a

before_pipeline_run

example but it's quite generic. Maybe a more pragmatic example could be a nice addition.

Nok Lam Chan

10/31/2022, 3:46 PM

That sounds like a good idea.

Nok Lam Chan

10/31/2022, 3:47 PM

Feel free to open a PR

Ben Levy

10/31/2022, 5:47 PM

I've often set it directly in

settings.py

🤔 1

Deepyaman Datta

11/01/2022, 2:35 AM

Doing it in

settings.py

may work, but I think

before_pipeline_run

is a lot more idiomatic.

👍 3

Lorenzo Castellino

11/03/2022, 11:16 AM

I did a bit more research on the matter and I came across this bit from the official Sklearn Docs. In it it's suggested to create an rng variable that contains the desired

np.random.RandomState()

that can be passed down to the various classes that accept the

random_state

kwarg. The reasoning behind it is explained in this SO discussion which is also linked in the aforementioned documentation articles. My idea at this point is to actually pickle the rng variable and load it where needed. This feels a bit tedious due to the fact that all the corresponding functions and pipelines needs to be updated accordingly. Is there a more elegant way to do so? I still think that Hooks might be the key but I can't see how...

Ben Levy

11/03/2022, 1:10 PM

You could initialize it as a global variable in a helper module (e.g.,

rng.py

) and then just import it into each node function if that's the strategy you want to employ(

from rng import RNG

)

👍 1

Lorenzo Castellino

11/03/2022, 1:48 PM

This sounds like a fine idea

3 Views

Open in Slack

Previous Next