:kedro: `kedro-datasets`: dependencies Hi Team, Wh...
# questions
m
K `kedro-datasets`: dependencies Hi Team, Where do you define dependencies for the
kedro-datasets
package? We ran into some pip resolver issues and turned out that from
kedro-datasets==1.0.0
and above would require kedro to be
kedro~=0.18.4
. We can verify this by
pip install kedro-datasets==0.0.7 --dry-run
but we don’t find where this dependency is actually defined. In
setup.py
it’s actually not mentioned. Thank you! @Kasper Janehag
n
@Mate Scharnitzky
kedro-datasets
is introduced with kedro 0.18.4, it’s not strictly depends on kedro>=0.18.4. With kedro>=0.18.4 it will automatically look for
kedro-datasets
. If you are using older Kedro version you can still upgrade with kedro-datasets but you need to define the full import path. I believe we have removed this strict dependency already, it was a mistake, let me check and get back to you later.
Can you try newer datasets version? The latest one is 1.2.0
m
Currently we can’t because we depend on kedro==0.18.3 -> in order to upgrade our kedro version, we need to figure out how to work with omegaconf, multi-runner et al. We still use jinja templated configure loader now which is causing some errors while upgrading to higher version of kedro.
Just to clarify, we’re good with kedro==0.18.3 and kedro-datasets==0.07.
n
It should be possible to use kedro==0.18.3 and kedro-datasets==1.2.0, is is causing any problem?
The hard dependency is in 1.0.0 and we removed it later, so it’s a known issue
m
Are you sure?
image.png
👍🏼 1
Then, why would pip try to collect
kedro~=0.18.4
?
n
Thanks, it’s definitely a mistake. We have the PR here 2 months ago but it’s somehow not merged yet. https://github.com/kedro-org/kedro-plugins/pull/140
m
Ah I see.
n
It’s an oversight, I am now moving it into our board to make sure we track it and do a release soon.
m
Cool, no issues. Then the answer is clear: the deps should be defined in the requirements.txt.
j
also, to your original question, dependencies are defined in
pyproject.toml
, the modern standard for Python project metadata https://github.com/kedro-org/kedro-plugins/blob/246e05f06063598279d03b86590258f9a2b343a0/kedro-datasets/pyproject.toml#L13-L15
m
Thanks, Juan! Yes, I did check the
.toml
file, but I don’t find it in
1.2.0
. Is this released?
j
nope, this is not released yet. hopefully Very Soon ™️
m
Hi @Juan Luis, @Nok Lam Chan FYI: @Kasper Janehag Just a follow-up on the above. We observed the below annoying pip resolver for a while in our repository. It’s annoying because it’s not deterministic, sometimes it fails, sometimes it works. We tried to fix it in multiple rounds, e.g., using
kedro-datasets[<>]
instead of
kedro[<>]
, but it never went completely away. Eventually, we believe it traces back to this pip issue. Essentially, pip gets confused when multiple dependencies are defined with the
~=
. Given that
kedro-datasets
pins
pandas~=1.3
along with other dependencies we have in our product for pandas, e.g.,
pandas~=1.3.0
it triggered this error for us. Now, we have a PR that refactors
kedro-datasets[pandas.<>]
to first order dependencies, and instead of using
~=
, we use lower and upper bounds, e.g.,
pandas>=1.3, <2.0
Finally the question: • Have you or other users observed this or similar error? • Do you have any other idea how this could be fixed? Obviously, using
kedro-datasets
would be a much more elegant way of dealing with these dependencies, but it seems until the pip issue is resolved, this would be a problem for us.
Copy code
pip._vendor.resolvelib.resolvers.InconsistentCandidate: Provided candidate LinkCandidate(.../pandas-1.5.3... (from https://...) (requires-python:>=3.8)') does not satisfy SpecifierRequirement('pandas~=1.3'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas>=1.1.1'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas>=1.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas~=1.3.0'), SpecifierRequirement('pandas'), SpecifierRequirement('pandas>=0.21.1'), SpecifierRequirement('pandas>=1.0.0'), SpecifierRequirement('pandas')
👀 1
n
I have never seen this, if you are setting pandas~=1.3.0, I believe it means 1.3.* instead of >1.3, <2.0. So the fix looks right for me.
m
We wanted to be more flexible and aligned with kedro’s
~=1.3
hence we set it to
>=1.3, <2.0
.
n
Does this cause any issue? I believe if I set ~=1.3 it would works too, but you set it as ~=1.3.0 which may confuse pip
m
we had issues with ~1.3 as well
n
🤔 interesting, so it fails even if Kedro-datasets and pandas have the same pin?
m
I don’t have the log in hand, but I definitely tried it
We’ll keep monitor it and probably try to get back to kedro-datasets while upgrading to higher versions of kedro, but I just wanted to check if you’ve seen this before.
👍🏼 1
n
Did you keep the requirements file that caused the issue? I just tried quickly to pin both Kedro-datasets and pandas and I don’t have any problem. Using pip 22.0.4 on a Window machine
Pinning 1.3 or 1.3.0 doesn’t cause any error. Former result in pandas 1.5.3 and latter pandas 1.3.5
m
We only reproduced this error on CI, while building dependencies at the package or global level
👍🏼 1
n
Alright, I guess this may only happen with more complicated dependencies. I don’t have experience with this, pinning @Juan Luis in case he has seen this before.
j
Notice that
~=1.3
is not the same as
~=1.3.0
👍🏼 1
I’m actually surprised that only fails sometimes
The latter means
>=1.3.0,<1.4
m
I know it’s different
1.3.0 was needed for other reasons, doesn’t come from kedro
j
(sorry answering from the phone - 1 minute and I’ll give a proper response)
m
If you look at the log, the problem is that pip finds 1.5.3 as a candidate but it should have never considered it at the first place, given we have ~=1.3.0.
j
oh, now I understand
for the record, I left a comment in the pip issue https://github.com/pypa/pip/issues/9613#issuecomment-1573472317 I'll see if I can quickly reproduce
yep, I can reproduce 😄
if you have a
requirements.txt
with these contents and try
pip install -r requirements.txt
with pip 22.0.4 on Python 3.8 (tested on macOS), it gives the error you showed @Mate Scharnitzky :
Copy code
pandas~=1.3
pandas~=1.3.0
pandas
pandas>=1.1.1
pandas
pandas
pandas~=1.3.0
pandas~=1.3.0
pandas~=1.3.0
pandas>=1.0
pandas~=1.3.0
pandas
pandas~=1.3.0
pandas~=1.3.0
pandas
pandas
pandas
pandas~=1.3.0
pandas~=1.3.0
pandas~=1.3.0
pandas~=1.3.0
pandas~=1.3.0
pandas
pandas~=1.3.0
pandas~=1.3.0
pandas
pandas~=1.3.0
pandas
pandas>=0.21.1
pandas>=1.0.0
pandas
👍🏼 1
👍 1
and latest pip has the same problem
I wanted to try if the newer
packaging
version fixes the issue but pip is incompatible with it.
</rabbit_hole>
m
Thanks Juan for checking!
🙌🏼 1