yay or nay? (created with <https://mymlops.com/bui...
# random
j
yay or nay? (created with https://mymlops.com/builder)
❤️ 5
i
What about data versioning? 😛 I know you use deltatables, but I'm not sure if that actually fits the commonly-used definition of data versioning in the scope of MLOps, right?
j
good catch 🦅 my problem with data versioning is that I know what it means to version a CSV file, but I have no idea how folks are doing this in production. do you think Delta Tables are more for the data engineering side of the equation @Iñigo Hidalgo?
i
in my (limited) experience and opinion, yeah, delta tables and the equivalent SCD SQL stuff is more on the data engineering E(LT/TL) side than the ML side. While loading data "as of" a certain date is important, I think what is more relevant is the transformations applied to that source data which is a mixture of "as of", and code versioning. I don't have any real experience on this side of things beyond knowing it's something we don't really do and maybe would want to do at some point in the future 😆 We work a lot with time series data, a lot of external forecasts etc, so the first side, SCDs, are a lot more important to us right now
💡 1
👍🏼 1
j
this prompted me to ask on MLOps.Community 👀 https://mlops-community.slack.com/archives/C015J2Y9RLM/p1707134303101179
In our case, we are using a delta lake approach (+ some parquet files). CSV is prohibited
🙈
n
I think what is more relevant is the transformations applied to that source data which is a mixture of "as of", and code versioning.
It only truly reproducible when you have both, it sounds like a good thing but there doesn't seem to be a big demand for it? People seem to happy with the basic versioning
j
or, nobody uses it... I don't think we have a way to know at the moment