How is the right way to go about specifying schema...
# questions
j
How is the right way to go about specifying schemas in the Data Catalog? In my case specifically I'm using Excel files, CSV files, Parquet files, and DataFrames, but it could be more general as well.
j
maybe @Yolan Honoré-Rougé ‘s
Kedro-pandera
could help?
j
I saw that. @Yolan Honoré-Rougé what's the current level of development on this? This is work for 7-figure paying clients so I'm a little nervous about using such a greenfield project, but it makes sense given that we are starting adopting pandera in general
Not to imply anything about you or your project, just trying to do my due diligence
y
TBH , I don't know anyone using it for production and the development is quite slow, so this is clearly not considered as a "mature" project ready for production. The main problem is that it currently lacks a lot of features and you'll be very limited on what you can do so you will likely need to customize it at some point. I guess it will be easier if you write your custom plugin from scratch rather that opening PR against the public repo, even if I'd love that! You can use the existing code as a starting point on how to create a custom hook or plugin. Feel free to contribute, even if it is only by opening issues to suggest new features, we'll take over development one day and hopefully make it as production ready as
kedro-mlflow
one day!
d
You also don’t have to use the plug-in you can just annotate the function called by the Kedro node with Pandera
which is a well tested and respected project
you’ll be decoupled from the catalog but I think there is some advantages to that too
y
Oh yes, my concerns were about the plugin kedro-pandera, not pandera itself which is well suited for production!
And you can also add schema in the catalog under the metadata key, this does not performs validation but can be useful for documenting the project
j
Metadata can have arbitrary key-value pairs, yes? So I could even just point to the Pandera Schema definition. Although this solution smells like something which goes out of date over time
👍 2