:kedro: Kedro Datasets: `pandas` dependencies Hi A...
# questions
m
K Kedro Datasets:
pandas
dependencies Hi All, What is the recommended way to handle dependencies for Kedro datasets together with other dependencies in a repo? • either specifying them through kedro, e.g.,
kedro[pandas.ExcelDataSet]
• or using
kedro_datasets
? Context • We’re in the process to upgrade our Python env from
3.7
to
3.9
• Our current kedro version is
0.18.3
• When upgrading our branch to
Python 3.9
and keeping all other things intact, we get a requirement compilation error for
pandas
. In our repo, we consistently pin pandas to
~=1.3.0
which should be aligned with kedro’s pin
~=1.3
defined in the form of
kedro[pandas.ExcelDataSet]==0.18.3
. Interestingly and surprisingly, if we remove
kedro[pandas.ExcelDataSet]==0.18.3
, the compilation error disappears, while
openpyxl
is missing (this latter is expected). • We’re thinking to change the way we load kedro datasets dependencies and use
kedro_datasets
instead, but we would like to get your guidance what’s your recommended handling kedro dataset dependencies, especially from a maintenance point of view. Thank you!
j
hello @Mate Scharnitzky! the right answer is "using `kedro_datasets`"
datasets in
kedro
are going away in 0.19.0
if you need any guidance in the migration, let us know
notice that you can do
pip install kedro-datasets[pandas.ExcelDataSet]
m
Thank you, Juan!
🙌🏼 1
j
also, thanks to your message we realized that this was not at all clear in our docs or the CLI 😄 https://github.com/kedro-org/kedro/issues/1501 so, thank you!
🫡 1