Gleydson Silva
12/06/2023, 8:37 PMDeepyaman Datta
12/06/2023, 8:38 PMGleydson Silva
12/06/2023, 8:39 PM@hook_impl
def after_catalog_created(
self,
catalog: DataCatalog,
load_versions: Dict[str, str]
):
<http://self._logger.info|self._logger.info>(f"catalog: {catalog.datasets.__dict__}")
<http://self._logger.info|self._logger.info>(f"load_versions: {load_versions}")
Gleydson Silva
12/06/2023, 8:43 PMGleydson Silva
12/06/2023, 8:43 PMGleydson Silva
12/06/2023, 8:48 PMkedro run --pipeline=scorer
it uses the latest version of my trained model, but I can't get the version of my hook or elsewhere.
When I run kerdo run --pipeline=scorer --load-version=myversion
I can get the version on my hook.
I would like to get the version of the loaded dataset in both cases.Deepyaman Datta
12/06/2023, 8:54 PMload_versions
in the hook is basically going to pick whatever was passed to the catalog creation, which is what you're passing in the CLI; it doesn't actually pass the actual loaded versions. I also don't think it could pull the actual versions at that time, since the actual "later" version is only figured out when you go to load the dataset (by parsing the file structure).Deepyaman Datta
12/06/2023, 8:56 PMGleydson Silva
12/06/2023, 8:57 PMDeepyaman Datta
12/06/2023, 9:04 PM@hook_impl
def after_catalog_created(
self,
catalog: DataCatalog,
):
<http://self._logger.info|self._logger.info>(f"catalog: {catalog.datasets.__dict__}")
model = catalog.datasets["my_model"]
<http://self._logger.info|self._logger.info>(f"load_versions: {load_versions}")
<http://self._logger.info|self._logger.info>(f"model load version: {model.resolve_load_version()}")
try something like this? i'm writing it just reading the code, so may need to tweak something if it doesn't work 🙂Deepyaman Datta
12/06/2023, 9:04 PMGleydson Silva
12/06/2023, 9:05 PMNok Lam Chan
12/07/2023, 9:39 AMDeepyaman Datta
12/07/2023, 1:59 PMresolve_load_version
be sufficient in this case? I see it mentioned in the issue, as well. In @Gleydson Silva’s case, specifically, he's using it in a scorer
pipeline, so you don't even need to worry about a new version of the model being produced before execution.Nok Lam Chan
12/07/2023, 2:07 PMresolve_load_version
should be fine. My point (The github issue) is that this is not currently expose in any public API. The load version is fetch when needed and it didn’t update the dataset definition. Thus you will still have self.load_version=None
in the dataset object itself.
It will probably works if you try to call resolve_load_version
in the hook manuallyDeepyaman Datta
12/07/2023, 2:12 PMThe load version is fetch when needed and it didn’t update the dataset definition.I think this is fine TBH, or at least "working as intended".
Nok Lam Chan
12/07/2023, 2:16 PMGleydson Silva
12/08/2023, 8:15 PM@hook_impl
def after_catalog_created(
self,
catalog: DataCatalog,
):
<http://self._logger.info|self._logger.info>(f"catalog: {catalog.datasets.__dict__}")
model = catalog.datasets.__dict__["my_model"]
<http://self._logger.info|self._logger.info>(f"load_versions: {load_versions}")
<http://self._logger.info|self._logger.info>(f"model load version: {model.resolve_load_version()}")
This seems to have solved my problem. I just add this __dict__
to your example @Deepyaman Datta. Thank you Guys 🙂
IMHO, this information should be in the docs. I think its not unusual people keeping track of what version of a model was used in a score pipeline.Deepyaman Datta
12/08/2023, 11:25 PM<http://catalog.datasets.my|catalog.datasets.my>_model
or getattr(catalog.datasets, "my_model")
would work slightly more cleanly; I don't 100% remember (it used to), just trying to avoid accessing non-public parts of the API. 🙂