Hi Currently kedro datasets geopandas only handles <https gi Kedro #plugins-integrations

Hi, Currently kedro_datasets.geopandas only handle...

Joost Gevaert

07/02/2024, 5:48 AM

Hi, Currently kedro_datasets.geopandas only handles geojson. But geopandas has methods for many other very useful data types as well, e.g. [GeoPackage](GeoPackage), Spatial Databases, Apache Parquet and Feather file formats), and more... I'd like to start by working with [GeoPackage](GeoPackage) 's. What would be the easiest way to get started? Should I start here: https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html?

Juan Luis

07/02/2024, 6:38 AM

hi @Joost Gevaert ! indeed, you’d need to create a custom dataset. or extend the geopandas one

👍 1

Juan Luis

07/02/2024, 6:38 AM

(btw, hope it’s compatible with GeoPandas 1.0! let us know otherwise)

👍 1

Joost Gevaert

07/02/2024, 7:08 AM

Yeah, GeoPackage is compatible with geopandas 1.0. GeoPackage is an awesome data format for GIS data: https://en.wikipedia.org/wiki/GeoPackage GeoPackage (GPKG) is an open, non-proprietary, platform-independent and standards-based data format for (3D) Vector and Raster GIS data

Joost Gevaert

07/02/2024, 7:14 AM

https://www.geopackage.org/

Juan Luis

07/02/2024, 9:22 AM

oh sorry I meant

kedro_datasets

being compatible with GeoPandas 1.0 😄

😄 1

Juan Luis

07/02/2024, 9:22 AM

any opinions on GeoPackage vs GeoArrow btw?

Juan Luis

07/02/2024, 9:22 AM

(or shapefiles 😬 )

Joost Gevaert

07/02/2024, 9:31 AM

Well, that http://switchfromshapefile.org/ website says all there is to say about shapefiles, right? haha I really don't like them

Joost Gevaert

07/02/2024, 9:54 AM

The output of my data pipelines often goes to people who use those results in ArcGIS Pro, so I'm trying to make sure it's as easy for them to work with my results as possible. Getting .geojson files into ArcGIS Pro is a little more complicated than getting .gpkg's in there. GeoArrow, I don't really know. So far I've not had the crazy big amounts of data yet for which it's necessary to use GeoArrow. To be honest, in general I'm not very familiar with all those columnar memory formats. What's the difference between arrow, feather, parquet? How would I get those results into ArcGIS Pro? Let me ask ChatGPT 🙂

Joost Gevaert

07/02/2024, 10:05 AM

Ah, .feather = .arrow? So I guess that gpd.GeoDataFrame.to_feather would work, but then I wouldn't (yet) know how I'd get that .feather file into an ArcGIS Pro project.

Joost Gevaert

07/02/2024, 10:07 AM

ChatGPT tells me that .feather files are easier to work with than .parquet files, and that .feather files are quicker, but less compatible. Would you agree?

Juan Luis

07/02/2024, 12:21 PM

what's the difference between arrow, feather, parquet?

let me know if this helps! https://dev.to/astrojuanlu/demystifying-apache-arrow-5b0a

❤️ 1

Juan Luis

07/02/2024, 12:21 PM

The Feather format was created alongside Arrow, and nowadays it provides decent compression (although Parquet files are usually smaller) and excellent read and write speeds (even better than Parquet). On the other hand, the Parquet format has much wider adoption and is more interoperable. If you are not sure which one is best and you're not concerned about squeezing the speed as much as possible, you can safely pick Parquet.

❤️ 1

Juan Luis

07/02/2024, 12:21 PM

(glad I could copy paste that from my past self 😄 )

😄 1

Juan Luis

07/02/2024, 12:22 PM

I think I actually meant GeoParquet originally, GeoArrow is probably too low level https://geoparquet.org/

Joost Gevaert

07/03/2024, 1:10 AM

The Demystifying Apache Arrow article definitely helped! Thanks

Joost Gevaert

07/03/2024, 1:14 AM

Regarding GeoParquet, it might come in handy once my data starts becoming too big to handle, but for now it's all doable with geopandas and GeoPackage. Hopefully I'll have a chance to also play with GeoParquet at some point in the near future, and will definitely let you know about my experience in case I do :)

💪🏼 1

Yolan Honoré-Rougé

07/04/2024, 6:56 AM

Definitely something I am interested to, working with a bunch of spatial data these days. I think these are good candidates for "experimental" datasets given our recent policy change ;)

👍 1

Juan Luis

07/14/2024, 12:04 PM

anecdote: today I was reading a 300 MB GeoParquet with GeoPandas and it took 3 minutes to load on my computer (~800k rows). I saved it to GeoParquet and now it takes 30 MB, loads in 3 seconds, and contains exactly the same information 🤯

😯 1

56 Views

Open in Slack

Previous Next