ey guys Just using today kedro 0 19 6 with iris databricks s Kedro #questions

ey guys! Just using today kedro 0.19.6 with iris-...

Erwin

06/20/2024, 2:43 PM

ey guys! Just using today kedro 0.19.6 with iris-databricks starter. (https://github.com/kedro-org/kedro-starters/blob/db79aec64c4a0f062321bd8c74ee78275[…]-iris/%7B%7B%20cookiecutter.repo_name%20%7D%7D/requirements.txt)

Copy code

kedro-datasets[spark.SparkDataset, pandas.ParquetDataset]>=1.0

I got the following today

Copy code

WARNING: kedro-datasets 3.0.1 does not provide the extra 'pandas.parquetdataset'
WARNING: kedro-datasets 3.0.1 does not provide the extra 'spark.sparkdataset'

Deepyaman Datta

06/20/2024, 2:46 PM

Can you try

kedro-datasets[pandas-parquetdataset]

Juan Luis

06/20/2024, 2:48 PM

it's a bug in the starter indeed!

Deepyaman Datta

06/20/2024, 2:49 PM

I don't fully understand to be honest—why does

pandas.ParquetDataset

not get normalized to

pandas-parquetdataset

such that users can keep writing with the dot? @Juan Luis feel like you know 🙂

Erwin

06/20/2024, 2:50 PM

Should I change from

Copy code

kedro-datasets[spark.SparkDataset, pandas.ParquetDataset]>=1.0

Copy code

kedro-datasets[spark-sparkdataset, pandas-parquetdataset]>=1.0

? Or is it expected to get normalized?

👍 1

Deepyaman Datta

06/20/2024, 2:51 PM

Try changing it and see if it works 🙂

Nok Lam Chan

06/20/2024, 2:51 PM

I think it depends on your pip version.

Deepyaman Datta

06/20/2024, 2:52 PM

As per https://packaging.python.org/en/latest/specifications/name-normalization/ (also: https://peps.python.org/pep-0503/#normalized-names), it should be normalized

Nok Lam Chan

06/20/2024, 2:53 PM

Using the normalised form is preferred imo, if you bump your pip version I expect it get normalised automatically

Juan Luis

06/20/2024, 2:53 PM

agreed with @Nok Lam Chan, it depends on the pip version

Erwin

06/20/2024, 2:53 PM

great, however this is databricks runtime. not sure if I can modify pip version or should. understanding that databricks runtimes are supposed to be "well tested" environments

🤮 1

Nok Lam Chan

06/20/2024, 2:53 PM

The change was introduce in 23.x IIRC, that was the motivation for us to switch to the normalised form

Deepyaman Datta

06/20/2024, 2:56 PM

Using the normalised form is preferred imo

I feel like the whole point of normalization is that you don't have to do the "preferred" thing. Since

mypackage-somereallyreallylongandunreadabledataset

is harder to read than

mypackage.SomeReallyReallyLongAndUnreadableDataset

🙂 But point taken on it being introduced recently to pip

Deepyaman Datta

06/20/2024, 2:56 PM

(and that other managers may not fully conform to normalization rules)

Juan Luis

06/20/2024, 2:57 PM

specifically, https://github.com/pypa/pip/issues/11715 which shipped PEP 685

Deepyaman Datta

06/20/2024, 2:58 PM

So not even released (non-beta) maybe?

Juan Luis

06/20/2024, 3:54 PM

sorry, it was actually pip 23.3 https://github.com/kedro-org/kedro-plugins/issues/553#issuecomment-1939220129

Nok Lam Chan

06/20/2024, 4:03 PM

But point taken on it being introduced recently to pip

I understand the argument. Like you said, there are many package manager and we have no control over the version that user is using.

pip

is usually consider a build requirements, so we cannot pin

pip>=23.3

as part of the requirement. Another thing that I don't understand is

pip install kedro[notexist]

will works fine. It just ignore the extra, so in case of older

pip

version it's very hard to know if you actually install the correct version of not.

Nok Lam Chan

06/20/2024, 4:06 PM

^ cause a lot of problem before as CI /pip install all looks fine until the pipeline fails

2 Views

Open in Slack

Previous Next