Databricks has always been ahead of its time!
# random
d
Databricks has always been ahead of its time!
šŸ˜‚ 1
(I was actually trying to figure out when medallion architecture was introduced compared to Kedro layers, if somebody has a better answer...)
j
then notice that the Lakehouse paper is from 2021
d
Why are the Kedro layers (or something similar) not more broadly used? Obviously, Databricks is a dominant player. I guess they also focus on the more general "data analytics" piece rather than trying to also cover ML use cases (which Kedro layers does, with model input/output, etc.)?
i
Kedro layers are 5-8 in number, medallion is 3 - simple is easier to remember, so... it was not unexpected... In fact I never liked our layer system, because it felt way too complicated...
j
I think the problem with Kedro layers is that they mix data, models, and reporting. like
Copy code
ā”‚   ā”œā”€ā”€ 01_raw
ā”‚   ā”œā”€ā”€ 02_intermediate   ^ Data pipelines
ā”‚   ā”œā”€ā”€ 03_primary        __________________
ā”‚   ā”œā”€ā”€ 04_feature        v ML pipelines
ā”‚   ā”œā”€ā”€ 05_model_input
ā”‚   ā”œā”€ā”€ 06_models         < Inference
ā”‚   ā”œā”€ā”€ 07_model_output   v Model outputs
ā”‚   ā”œā”€ā”€ 08_reporting
The medallion architecture is just concerned with data pipelines, and leaves the whole MLOps story aside
i
That's not a problem per se, but rather making the Kedro layer system larger in scope. If that's the case, maybe we need to find an easy mapping for the medallion model and map them to the first 3,4 layers and promote that mapping. This way we can evolve our system, rather than completely abandoning it. I have no idea how widely use it is by our users and whether it's worth saving it and improving it.
d
I think medallion layers correspond to raw/intermediate/primary in the QB/Kedro system. https://dataengineering.wiki/Concepts/Medallion+Architecture#Medallion+Architecture+Disadvantages mentioned a disadvantage of medallion as requiring downstream processing, for analytics/ML/etc., so I think the Kedro system is basically all those other layer šŸ˜‚ In a large organization, the ML/DS teams are doing the feature engineering, not tossing it over to the DE, so I think it's fair that feature engineering is past medallion gold.
j
oh that wiki looks so neat, good find!
šŸ„²
d
Nothing for SQLMesh either, if it makes you feel better one way or another.