kedro Kedro Datasets `pandas` dependencies Hi All What is t Kedro #questions

:kedro: Kedro Datasets: `pandas` dependencies Hi A...

Mate Scharnitzky

04/20/2023, 8:29 AM

K Kedro Datasets:

pandas

dependencies Hi All, What is the recommended way to handle dependencies for Kedro datasets together with other dependencies in a repo? • either specifying them through kedro, e.g.,

kedro[pandas.ExcelDataSet]

• or using

kedro_datasets

? Context • We’re in the process to upgrade our Python env from

3.7

3.9

• Our current kedro version is

0.18.3

• When upgrading our branch to

Python 3.9

and keeping all other things intact, we get a requirement compilation error for

pandas

. In our repo, we consistently pin pandas to

~=1.3.0

which should be aligned with kedro’s pin

~=1.3

defined in the form of

kedro[pandas.ExcelDataSet]==0.18.3

. Interestingly and surprisingly, if we remove

kedro[pandas.ExcelDataSet]==0.18.3

, the compilation error disappears, while

openpyxl

is missing (this latter is expected). • We’re thinking to change the way we load kedro datasets dependencies and use

kedro_datasets

instead, but we would like to get your guidance what’s your recommended handling kedro dataset dependencies, especially from a maintenance point of view. Thank you!

Juan Luis

04/20/2023, 8:34 AM

hello @Mate Scharnitzky! the right answer is "using `kedro_datasets`"

Juan Luis

04/20/2023, 8:35 AM

datasets in

kedro

are going away in 0.19.0

Juan Luis

04/20/2023, 8:35 AM

if you need any guidance in the migration, let us know

Juan Luis

04/20/2023, 8:37 AM

notice that you can do

pip install kedro-datasets[pandas.ExcelDataSet]

Mate Scharnitzky

04/20/2023, 8:38 AM

Thank you, Juan!

🙌🏼 1

Juan Luis

04/20/2023, 9:47 AM

also, thanks to your message we realized that this was not at all clear in our docs or the CLI 😄 https://github.com/kedro-org/kedro/issues/1501 so, thank you!

🫡 1

Open in Slack

Previous Next