Hello, I am using kedro with dvc for data version ...
# questions
Hello, I am using kedro with dvc for data version control. The dvc is based on
which depends on
semver >= 3
Unfortunately I cannot install
kedro-viz 6.3.0
depends on
semver < 3
Is there any reason why
is limited to
semver < 3
? The current
. Could anyone from kedro-viz team relax this dependency limitation?
Sorry for the late response. Any chance you know why they did this? https://github.com/iterative/gto/pull/348, it looks like it’s just an automated bump but not necessarily breaking anything. Ccing @Rashida Kanchwala @Ravi Kumar Pilla Do you know why we pin
semver <3
On the other hand, I am quite curious how you are using Kedro with DVC, would you mind sharing a bit your setup and workflow? Are you using Kedro versioning or just DVC? Cc: @Juan Luis
👀 1
Hi @Nok Lam Chan, thanks for this reply. I am using dvc just to save the binary files in order to keep them close to my repo. I could use GitLFS, but I prefer to version binary files with DVC. GTO is very simple “Dataset registry” for my case. It is using git tags in order to tags the artifacts name+version, so I can quickly gith checkout the proper revision (providin the artifact name + version), get the MD5 hash of the binary file (dataset in tar) and run
dvc pull
, which pretty close to
git pull
🙂 Everything has nice CLI with `dvc`; see dvc.org I do not know why the are using semver >= 3
I wish I could use kedro’s versioned dataset, but actually
does not provide any dataset registry.
@Rafał Nowak dataset registry = being able to browse past versions of a dataset?
What do you mean by “data registry”? what are you using it for
For me dataset registry is for 1. Prepare data file (possibly binary file, for example with dataset created by kedro pipeline -> stored locally on disk) 2. Send data file to the remote storage 3. Register the data file semehow -> GTO is adding git tag the recent commit of GIT -> for example
is able to get the registered artifact
provided any version. So
dvc get dataset@v0.3.5
it cloning the repo in temporary dir -> checkout out the
hash -> downloading the proper file from remote storage -> moving the file to you current directory location 🙂 DVC can work with storage being s3, gcp, goodle drive 🙂 … and many more that I do not use,
so they are using
for example, to create nice version names 😉
and comare them, bump minor, bump patch, bump major, etc
thanks a lot for the insight @Rafał Nowak. it's not the first time someone uses DVC alongside Kedro https://kedro-org.slack.com/archives/C03RKP2LW64/p1683296725893669?thread_ts=1683296578.127849&amp;cid=C03RKP2LW64 we should definitely look more into this