Is there some flag in `kedro viz` to disable datas...
# questions
m
Is there some flag in
kedro viz
to disable dataset checking? I just want to see the pipeline structure on of a project but the project itself has data catalog entries do a directories / files that I donโ€™t have access locally (in S3). Right now it fails on
.exists()
call in the datasets.
n
Are you on the latest version of viz? I don't recall it checking datasets
m
Yes, 6.7.0
It goes into
kedro_viz.server.populate_data
->
data_access_manager.resolve_dataset_factory_patterns
->
catalog.exists(dataset_name)
s
Hey I had the same problem, so I would also be interested. In some other cases the check of data took too long and kedro-viz was canceled after 60s timeout. In this case, it would also be good to deactivate the check. My interim solution was my own dataset class in which I adapted the _exists function...
๐Ÿ™ˆ 1
m
I imagine sth like `kedro viz --skip-datsets`would be nice
๐Ÿ‘ 1
s
Other interim solution would be to go back to kedro-viz version 6.6.1. If I remember correctly, the data check did not yet exist in this version
๐Ÿ‘๐Ÿผ 1
m
Thatโ€™s actually super convenient, thanks @Simon Wolf
FYI @Nero Okwa
n
Interesting, I don't recall this either. FYI @Rashida Kanchwala
@Simon Wolf please provide more context on the other cases when the data check too long and Kedro-Viz cancelled.
s
I use a custom PartitionedDataset to load tables from databricks hive storage. In this case the _list_partitions function takes a while, because there are many tables in the database. executing the spark function to list all tables takes a while. Probably also because the tables are distributed in the cluster. The _exists() function normally simply calls bool(self._list_partitions()) for a PartitionedDataset. and if the list_partitions() takes too long, the whole thing doesn't work. I could also imagine that this could be a problem in other scenarios (using the standard PartitionedDataset or other single Dataset Classes) if the data is not stored locally and there are therefore delays when loading/listing files...
a
I donโ€™t know if itโ€™s been released yet but this has been resolved - https://github.com/kedro-org/kedro-viz/issues/1645
๐Ÿ‘๐Ÿฝ 1
๐Ÿ‘ 2
r
Yes we will be releasing the fix for it hopefully today in Kedro-viz 7.0
m
It will probably not cover the latency problem though
r
True. Looping @Ravi Kumar Pilla
r
The check was introduced in kedro viz 6.7.0 to discover factory patterns but I see the problem here with the latency. Quick solution could be as @marrrcin suggested to have a flag to disable this discovery. But, this will also disable the dataset factory pattern discovery. This needs further discussion with the team and I will have a look at it. Thank you
๐Ÿ‘๐Ÿฝ 1
Hi All, Thank you for your patience. We had an internal discussion with the team and decided to drop dataset factory pattern discovery implementation for kedro viz 7.0.0. This removes the dataset existence check and will resolve the issues mentioned in this thread. However, this will also remove the support for dataset factory patterns from experiment tracking. We will add this to our backlog and work on it.
๐Ÿ‘€ 1
๐Ÿ‘ 1