https://kedro.org/ logo
#questions
Title
# questions
i

Iñigo Hidalgo

02/15/2024, 10:29 AM
https://docs.kedro.org/projects/kedro-viz/en/stable/preview_datasets.html#preview-data-in-kedro-viz
Kedro-Viz version 6.3.0 currently supports preview of two types of datasets:
pandas.CSVDataset
pandas.ExcelDataset
Is there something I can do on my end to make a custom dataset compatible with dataset preview? I would like to show an arrow dataset's schema in the preview field
n

Nok Lam Chan

02/15/2024, 10:50 AM
Not yet, it will be possible in next release via custom type hint
i

Iñigo Hidalgo

02/15/2024, 10:59 AM
cool!! so adding a
preview
method and correctly typing it with
TablePreview
would be enough?
can the preview method take any args I pass in the
preview_args
metadata? or is it meant to specifically be
nrows
?
n

Nok Lam Chan

02/15/2024, 12:19 PM
preview_args will be passed to the
preview()
method, so it's arbitrary arguments. For type hint, you need to follow the type hint and return the corresponding type. For example for table I think it's expecting a JSON format, the way that I think of it it's like a payload sending to the frontend (viz).
K 1
If you look at different
preview
method in existing datasets, they should have different signatures
i

Iñigo Hidalgo

02/15/2024, 12:51 PM
Really cool refactor, I would tag Rashida but she seems to be on holiday 😄
n

Nok Lam Chan

02/15/2024, 1:14 PM
She is on PTO :D I think @Antony Milne come up with the type hint idea
a

Antony Milne

02/19/2024, 9:23 AM
Indeed this was my idea! You can see there’s some reservations around the mechanism around using type hints for this (including from myself) so I’d be interested in hearing what you think of it 🙂 One thing to remember is that you should ideally have default values for all the
preview
arguments so that it can be run without the user needing to specify explicit values in catalog.yml.
😁 1
n

Nero Okwa

02/19/2024, 10:32 AM
CC @Rashida Kanchwala
Is there something I can do on my end to make a custom dataset compatible with dataset preview?
I would like to show an arrow dataset's schema in the preview field
@Iñigo Hidalgo Thanks for this. QQ what problem would dataset preview be solving for you for this? How have you previously shown an arrow dataset schema? 🙂
i

Iñigo Hidalgo

02/19/2024, 10:45 AM
So when looking at kedro viz pipelines I generally don't actually care about seeing the top N rows, but I do care about seeing what columns a certain dataset contains, and their datatypes. So here in the preview method I would return a table with the column names and their arrow datatypes in the row below. It could also be extended to include further summary statistics with a parameter in the preview i.e.
def preview(self, show_statistics=False)
where if show_statistics is true, it could show various statistics in further rows.
👍 1
some reservations around the mechanism around using type hints for this (including from myself) so I’d be interested in hearing what you think of it
on the technical side of things I can't really comment on what I think its possible limitations could be. It's true that it's kind of a different way of doing things than we're used to in python/kedro world. Thinking of pure-python implementations without type hints I might have gone towards having different
preview_table
,
preview_plotly
methods or something like that, but I'm sure that would've had its own set of issues. As far as the merged feature goes, I like the implementation. As long as the documentation is updated (particularly the fact that we aren't actually expected to return a table, rather a json object), it is clear what we're supposed to do and how to do it, so I like it.
👍 1
👍🏼 1
a

Antony Milne

02/19/2024, 11:01 AM
Thanks for the comments @Iñigo Hidalgo, that’s very interesting. Your case of data type and summary statistics is exactly the sort of thing I had in mind when designing this so I’m very pleased it’s worked out! 🙂
kedroid 1
i

Iñigo Hidalgo

02/19/2024, 11:07 AM
It’s definitely a powerful feature. The more I think about it the more ideas I have. Model objects -> feature importances, architecture etc Datasets -> last_updated etc
Btw, I haven't really looked into the kedro-datasets package much. We recently upgraded from 0.17.1 to 0.18.14. To actually be able to use this feature, I guess I will need to 1. have the latest version of kedro viz once it's released, 2. but on the kedro/kedro-dataset implementation, it should be enough to type my custom datasets with one of the types provided here, right? https://github.com/kedro-org/kedro-plugins/blob/190afed860a85dfb29e246fc252e153392b22600/kedro-datasets/kedro_datasets/_typing.py
2 Views