Hi Team! Anyone ever played with hyperparameter t...
# plugins-integrations
g
Hi Team! Anyone ever played with hyperparameter tuning frameworks within kedro? I have found several scattered pieces of info related to this topic, but no complete solutions. Ultimately, I think what I would like to set up is a way to have multiple nodes running at the same time and all contributing to the same tuning experiment. I would prefer using optuna and this is the way I would go about it based on what I have found online: 1. Create a node that creates an optuna study 2. Create N nodes that each run hyperparameter tuning in parallel. Each of them loads the optuna study and if using kedro-mlflow each hyperparameter trial can be logged into its own nested run. 3. Create a final nodes that process the results of all tuning nodes Does this sound reasonable to you? Has anyone produced such a kedro workflow already? I would love to see what it looks like. I am also wondering: • I am thinking of creating an OptunaStudyDataset for the optuna study . Has anyone attempted this already? • For creating N tuning nodes, I am thinking of using the approach presented on the GetInData blog post on dynamic pipelines. Would this be the recommended approach? Thanks!
h
Someone will reply to you shortly. In the meantime, this might help:
j
for now the semi-official approach is the blog post you mentioned - how was that process by the way? any pros and cons you saw?
I think some folks have tried to use Optuna w/ Kedro in the past
🥳 1
g
Do you mean it is semi-official because there's not yet an official approach? Is there any discussion I could follow?
I have not tried implementing it yet, for now it seems reasonable to me but I am asking because I am trying to understand the pros and cons. Once I get to it, happy to give some feedback (and maybe even some simple code example).
h
Hé I created a setup for this some time ago, where I use a optuna study dataset, and a yaml configuration loader so you can set all the trial parameters in your conf. If you’d like we can discuss?
g
Hi @Hugo Evers! Yes, that would be super nice, thank you!
@Juan Luis I just tried the dynamic pipeline setup. It's actually very similar to what I have been doing so far except I use native YAML inheritance instead of the OmegaConfLoader merge resolver with the custom
_overrides
. (BTW, they appear when you do
kedro catalog list
). I feel it looks much neater. Is there any drawback doing it that way? Let me give you an example: Blog post parameter file:
Copy code
study_params:
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10

price_predictor:
  _overrides:
    study_name: price_predictor_base
  study_params: ${merge:${study_params},${._overrides}}

  base:
    study_params: ${..study_params}

  candidate1:
    _overrides:
      study_name: price_predictor_candidate1
    study_params: ${merge:${..study_params},${._overrides}}

  candidate2:
    _overrides:
      study_name: price_predictor_candidate2
    study_params: ${merge:${..study_params},${._overrides}}

  candidate3:
    _overrides:
      study_name: price_predictor_candidate3
    study_params: ${merge:${..study_params},${._overrides}}

reviews_predictor:
  _overrides:
    study_name: reviews_predictor_base
  study_params: ${merge:${study_params},${._overrides}}

  base:
    study_params: ${..study_params}

  test1:
    _overrides:
      study_name: reviews_predictor_test1
    study_params: ${merge:${..study_params},${._overrides}}
Using the native YAML inheritance:
Copy code
study_params: &base_study_params
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10

price_predictor: 
  base: 
    study_params: &price_predictor_base_study_params
      <<: *base_study_params
      study_name: price_predictor_base

  candidate1:
    study_params:
      <<: *price_predictor_base_study_params
      study_name: price_predictor_candidate1

  candidate2:
    study_params:
      <<: *price_predictor_base_study_params
      study_name: price_predictor_candidate2

  candidate3:
    study_params:
      <<: *price_predictor_base_study_params
      study_name: price_predictor_candidate3

reviews_predictor:
  base: 
    study_params: &reviews_predictor_base_study_params
      <<: *base_study_params
      study_name: reviews_predictor_base

  candidate1:
    study_params:
      <<: *reviews_predictor_base_study_params
      study_name: reviews_predictor_test1
Happy to hear your thoughts on this!
j
It's actually very similar to what I have been doing so far except I use native YAML inheritance instead of the OmegaConfLoader merge resolver with the custom
_overrides
.
I do prefer the YAML merge keys version actually 😄 @marrrcin any thoughts?
m
It will not work if you want to override some parts, no?
g
@marrrcin I am not sure I understand. 😅 Do you have an example?
m
So if you have:
Copy code
study_params: &base_study_params
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10
and then
Copy code
reviews_predictor:
  base: 
    study_params: &reviews_predictor_base_study_params
      <<: *base_study_params
      study_name: reviews_predictor_base
Is
study_name
correctly overwritten? If so, then it's super cool!
g
I can confirm it is:
Copy code
yaml.safe_load(open("./test.yml"))
Out[5]: 
{'study_params': {'study_name': 'test',
  'load_if_exists': True,
  'direction': 'maximize',
  'n_trials_per_process': 10},
 'reviews_predictor': {'base': {'study_params': {'study_name': 'reviews_predictor_base',
    'load_if_exists': True,
    'direction': 'maximize',
    'n_trials_per_process': 10}}}}
Here is my
test.tml
Copy code
study_params: &base_study_params
  study_name: test
  load_if_exists: true
  direction: maximize
  n_trials_per_process: 10

reviews_predictor:
  base: 
    study_params:
      <<: *base_study_params
      study_name: reviews_predictor_base
I also think it's quite cool and I like the fact that it's native so completely kedro independent.
this 2