Hello all. I want to log the number of rows for th...
# questions
n
Hello all. I want to log the number of rows for the datasets at each step of my pipeline. Do you know a good way to do that ? I’ve seen this great pull request to about logging the number of columns and rows. Is this already possible in the current module ? I didn’t find much documentation about this.
d
@Nero Okwa
n
Thanks @Nicolas Oulianov. This isn't in the current module yet. But would be reviewed during backlog grooming. How does logging the number of rows for the datasets help your workflow? is it mainly for debugging? Pls add your comment to the ticket so we can consider it during the session.
n
Yes, it’s for debugging. The goal is to notice big drop of rows during one data transformation step. For example, after one node, I may see that my number of lines drops by 30% when it’s supposed to stay the same. How to monitor this with the currently available tools ?
👍🏽 1
d
I can think of two ways: • Logging this information via a
before_dataset_saved
hook • Using the Experiment Tracking feature and viewing the trends there
👍🏼 1
👍🏽 1
n
Thank you @datajoely ! The hook would be great to do that. Do you have an example of implementation ? Eg link to kedro project on github I’ve read through the docs but can’t figure out how to start setting it up.
d
these are the exp tracking docs
it will show up in the 🧪 view in Viz
n
@Nicolas Oulianov this feature has been implemented on the latest Kedro-Viz release. Can you confirm if this solves your pain point and provide feed back. Thanks.
n
Hi Nero ! Thanks for sharing this update. Looks great on the demo, thanks for shipping it ! However, on my code: • By default, it shows N/A rows and columns for my ExcelDataSet and CSVDataSet • The preview feature doesn’t work with PartitionedDataset
I will try running the pipeline and see if it fixes it 🙂
Running the pipeline fixed it ! It would be great to add this to the doc about metadata somewhere (if you see N/A rows and columns, it can be because xxxx or xxx) I had to read the PR to figure this out Anyways, great addition ! Very happy that’s it’s in here now. Super handy