I am facing an issue where a MetricsDataSet succes...
# questions
j
I am facing an issue where a MetricsDataSet successfully loads from the catalog in a notebook, where the catalog is created with
%load_ext kedro.ipython
. However, in a standalone file when I am creating the catalog as follows:
Copy code
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project

project_path = Path(".").resolve()
metadata = bootstrap_project(project_path)
with KedroSession.create(metadata.package_name, project_path) as session:
    context = session.load_context()
    catalog = context.catalog

data = catalog.load("my_metrics")
I get the following error:
DataSetError: Loading not supported for 'MetricsDataSet'
If this is true, why does it load in a notebook?
j
hi @Jordan, I'm surprised it loads at all - the source code shows that there's no
_load
implementation: https://github.com/kedro-org/kedro-plugins/blob/435689f0eba9f643e1f9e8ac75f2151279[…]7e476/kedro-datasets/kedro_datasets/tracking/metrics_dataset.py are you certain that it's working inside the notebook?
j
@Juan Luis That's what I was thinking, but it certainly seems to:
πŸ‘€ 1
I was hoping to embed the metrics into a Streamlit dashboard when I came across this
j
@Jordan I tried locally and can't reproduce. But with your help we can try to understand what's going on. one interesting thing I observe is that your
.exists
method returns
True
, whereas in my case returns
False
. could you please run this code in the notebook session (where the loading is working for you):
Copy code
catalog._get_dataset("original_total_power")._version
and also
Copy code
_ds._load??
and paste or show the outputs here?
j
@Juan Luis Thanks for looking into it! Here's the output I get:
j
oops sorry,
_ds
was supposed to be this:
Copy code
_ds = catalog._get_dataset("original_total_power")
then
_ds._load??
should work
j
Ah okay, so it looks like it's falling back to a JSONDataSet to do the loading. Any idea why this might be happening?
j
okay, at least now we understand why the
load
proceeds πŸ€” but still unclear why
MetricsDataSet
is not there
can you do
Copy code
from kedro_datasets.tracking.metrics_dataset import MetricsDataSet
and then show the full source using
MetricsDataSet??
finally, printing the
kedro_datasets
version will definitely help:
import kedro_datasets; print(kedro_datasets.__version__)
j
Hmm. I have both Kedro and Kedro Datasets installed in my environment, I don't know if that could confuse anything
I see that MetricsDataSet inherits from JSONDataSet
j
okay, so it looks like we both have
kedro_datasets
1.2.0. yes,
MetricsDataSet
inherits from
JSONDataSet
, but I was expecting to see a
_load
method there that would override the parent one. I see
...
there, maybe the output was truncated?
what about
Copy code
print(MetricsDataSet._load)  # I expect <function JSONDataSet._load at ...> in your case, should be <function MetricsDataSet._load at ...> instead
print(MetricsDataSet.__init__)  # I expect <function JSONDataSet.__init__ at ...> in both
j
If I simply click on the link it opens the local file and there is a
_load
method that should be overriding:
Copy code
def _load(self) -> NoReturn:
        raise DataSetError(f"Loading not supported for '{self.__class__.__name__}'")
You are right!
😱 1
I wonder if I can reproduce this from scratch with a new environment
j
Python, go home, you're drunk
j
haha
@Juan Luis When I install
kedro-datasets
, what do I get for free, as it were? ie. If I just install
kedro-datasets
without passing any extras, I should get all the built in datasets like JSON, Metrics, etc, but not things like Parquet, and so on. Is this assumption correct?
j
@Jordan actually, JSON, Metrics etc are not "built-in", but "on the same level" as anything else. the only built-in datasets are the parent classes in
kedro
, like
AbstractDataSet
,
PartitionedDataSet
and so on. if you install
kedro-datasets
without any extras, you don't get anything
j
I have a feeling I did not do this when I set up the environment in which I have been operating. Is it the sensible thing to always prioritise the
kedro-datasets
implementation of a particular dataset if it is also available in the
kedro
core framework?
j
yes, because
kedro.extras.datasets
is deprecated and will go away in Kedro 0.19. we just added a
DeprecationWarning
in 0.18.8
j
Nice. Okay, I will try to correct this and report back if I can reproduce the problem.
Thanks for all the help!
j
any time! πŸ˜„
j
@Juan Luis Damn, it seems I've got the same problem with a fresh setup, it must be something that I'm doing. Maybe Poetry. I created a test repo here in case you want to try and reproduce: https://github.com/b4rlw/metrics-test
j
so weird. I see
conda-lock.yml
and
poetry.lock
, how should I try to recreate the environment? also, I see you were using VSCode right? or does this happen in a normal
ipython
shell too?
never mind, I see conda only installs poetry πŸ‘πŸΌ
j
Yeah, just the interpreter, then I like to just install the Python deps with Poetry in the same environment. Unusual I know, but everything gets a lock file this way.
environment.yml
has the non-standard
platforms
key. I've been installing with Micromamba like:
Copy code
micromamba env create -f conda-lock.yml -p ./.env
micromamba activate ./.env
poetry install
Then everything should be good to go.
j
(micromamba ftw ✌️)
πŸ₯³ 1
j
I was indeed using VSCode, I prefer their implementation of Jupyter notebooks over the default web based one. I don't usually use the
ipython
shell, so I can't speak to that
j
just for completeness, can you check if
Copy code
$ ipython
>>> %load_ext kedro.ipython
>>> ...
displays the same weird behavior? namely, that
MetricsDataSet._load
seems to be gone
I doubt it's a VSCode-specific thing but better to be sure
😳 I can reproduce on IPython
j
Screenshot 2023-05-03 at 17.18.25.png
Screenshot 2023-05-03 at 17.18.30.png
j
yep, exactly
j
It's a weird one
j
okay I have a reproducer and I'm opening an issue, I'll be back
well this was kind of crazy, but here we are https://github.com/kedro-org/kedro/issues/2554
K 1
j
Nice work finding the Kedro Viz connection!
I can remove
black
,
conda-lock
and
pytest
from the dev deps to slim down builds while debugging if you want
j
I suspect that even a simple setup with
kedro
,
kedro-viz
, and
kedro-datasets[tracking]
would probably suffice, if you want to try that out (and maybe reproduce it without the need of a full Docker container) feel free to add more information to the issue @Jordan!
πŸ‘ 1
j
Cool, I'll experiment!
a
I know exactly what’s going on here - will comment on https://github.com/kedro-org/kedro/issues/2554 after the long weekend, but if you want to understand straight away then look here onwards… Sorry it’s so confusing! I can definitely appreciate how mysterious this must have seemed πŸ˜„