<https docs kedro org projects kedro viz en stable preview d Kedro #questions

Join Slack

<https://docs.kedro.org/projects/kedro-viz/en/stab...

# questions

Iñigo Hidalgo

02/15/2024, 10:29 AM

https://docs.kedro.org/projects/kedro-viz/en/stable/preview_datasets.html#preview-data-in-kedro-viz

Kedro-Viz version 6.3.0 currently supports preview of two types of datasets:

•

pandas.CSVDataset

•

pandas.ExcelDataset

Is there something I can do on my end to make a custom dataset compatible with dataset preview? I would like to show an arrow dataset's schema in the preview field

Nok Lam Chan

02/15/2024, 10:50 AM

Not yet, it will be possible in next release via custom type hint

Nok Lam Chan

02/15/2024, 10:51 AM

https://github.com/kedro-org/kedro-plugins/pull/504

Iñigo Hidalgo

02/15/2024, 10:59 AM

cool!! so adding a

preview

method and correctly typing it with

TablePreview

would be enough?

Iñigo Hidalgo

02/15/2024, 11:00 AM

can the preview method take any args I pass in the

preview_args

metadata? or is it meant to specifically be

nrows

Nok Lam Chan

02/15/2024, 12:19 PM

preview_args will be passed to the

preview()

method, so it's arbitrary arguments. For type hint, you need to follow the type hint and return the corresponding type. For example for table I think it's expecting a JSON format, the way that I think of it it's like a payload sending to the frontend (viz).

K 1

Nok Lam Chan

02/15/2024, 12:20 PM

If you look at different

preview

method in existing datasets, they should have different signatures

Iñigo Hidalgo

02/15/2024, 12:51 PM

Really cool refactor, I would tag Rashida but she seems to be on holiday 😄

Nok Lam Chan

02/15/2024, 1:14 PM

She is on PTO :D I think @Antony Milne come up with the type hint idea

Antony Milne

02/19/2024, 9:23 AM

Indeed this was my idea! You can see there’s some reservations around the mechanism around using type hints for this (including from myself) so I’d be interested in hearing what you think of it 🙂 One thing to remember is that you should ideally have default values for all the

preview

arguments so that it can be run without the user needing to specify explicit values in catalog.yml.

😁 1

Nero Okwa

02/19/2024, 10:32 AM

CC @Rashida Kanchwala

Nero Okwa

02/19/2024, 10:41 AM

Is there something I can do on my end to make a custom dataset compatible with dataset preview?

I would like to show an arrow dataset's schema in the preview field

@Iñigo Hidalgo Thanks for this. QQ what problem would dataset preview be solving for you for this? How have you previously shown an arrow dataset schema? 🙂

Iñigo Hidalgo

02/19/2024, 10:45 AM

So when looking at kedro viz pipelines I generally don't actually care about seeing the top N rows, but I do care about seeing what columns a certain dataset contains, and their datatypes. So here in the preview method I would return a table with the column names and their arrow datatypes in the row below. It could also be extended to include further summary statistics with a parameter in the preview i.e.

def preview(self, show_statistics=False)

where if show_statistics is true, it could show various statistics in further rows.

👍 1

Iñigo Hidalgo

02/19/2024, 10:50 AM

some reservations around the mechanism around using type hints for this (including from myself) so I’d be interested in hearing what you think of it

on the technical side of things I can't really comment on what I think its possible limitations could be. It's true that it's kind of a different way of doing things than we're used to in python/kedro world. Thinking of pure-python implementations without type hints I might have gone towards having different

preview_table

preview_plotly

methods or something like that, but I'm sure that would've had its own set of issues. As far as the merged feature goes, I like the implementation. As long as the documentation is updated (particularly the fact that we aren't actually expected to return a table, rather a json object), it is clear what we're supposed to do and how to do it, so I like it.

👍🏼 1

👍 1

Antony Milne

02/19/2024, 11:01 AM

Thanks for the comments @Iñigo Hidalgo, that’s very interesting. Your case of data type and summary statistics is exactly the sort of thing I had in mind when designing this so I’m very pleased it’s worked out! 🙂

kedroid 1

Iñigo Hidalgo

02/19/2024, 11:07 AM

It’s definitely a powerful feature. The more I think about it the more ideas I have. Model objects -> feature importances, architecture etc Datasets -> last_updated etc

Iñigo Hidalgo

02/19/2024, 11:20 AM

Btw, I haven't really looked into the kedro-datasets package much. We recently upgraded from 0.17.1 to 0.18.14. To actually be able to use this feature, I guess I will need to 1. have the latest version of kedro viz once it's released, 2. but on the kedro/kedro-dataset implementation, it should be enough to type my custom datasets with one of the types provided here, right? https://github.com/kedro-org/kedro-plugins/blob/190afed860a85dfb29e246fc252e153392b22600/kedro-datasets/kedro_datasets/_typing.py

7 Views

Open in Slack

Previous Next