new blog post: Building scalable data pipelines with Kedro and Ibis K
ibis
https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis
If you have ever
• ...slurped up large amounts of data into memory, instead of pushing execution down to the source database/engine
• ...prototyped a node in pandas, and then rewritten it in PySpark/Snowpark/some other native dataframe API
• ...implemented a proof-of-concept solution in 3-4 months on data extracts, and then struggled massively when you needed to move to running against the production databases and scale out
• ...insisted on using Kedro across the full data engineering/data science workflow for consistency (fair enough), although dbt would have been the much better fit for non-ML pipelines, because you essentially needed a SQL workflow
then this blog post is for you!
thanks
@Deepyaman Datta for writing it, and to
@Cody Peterson and the rest of the Voltron team for reviewing the draft! 👏🏼