Mate Scharnitzky
04/20/2023, 8:29 AMpandas
dependencies
Hi All,
What is the recommended way to handle dependencies for Kedro datasets together with other dependencies in a repo?
• either specifying them through kedro, e.g., kedro[pandas.ExcelDataSet]
• or using kedro_datasets
?
Context
• We’re in the process to upgrade our Python env from 3.7
to 3.9
• Our current kedro version is 0.18.3
• When upgrading our branch to Python 3.9
and keeping all other things intact, we get a requirement compilation error for pandas
. In our repo, we consistently pin pandas to ~=1.3.0
which should be aligned with kedro’s pin ~=1.3
defined in the form of kedro[pandas.ExcelDataSet]==0.18.3
. Interestingly and surprisingly, if we remove kedro[pandas.ExcelDataSet]==0.18.3
, the compilation error disappears, while openpyxl
is missing (this latter is expected).
• We’re thinking to change the way we load kedro datasets dependencies and use kedro_datasets
instead, but we would like to get your guidance what’s your recommended handling kedro dataset dependencies, especially from a maintenance point of view.
Thank you!Juan Luis
04/20/2023, 8:34 AMkedro
are going away in 0.19.0pip install kedro-datasets[pandas.ExcelDataSet]
Mate Scharnitzky
04/20/2023, 8:38 AMJuan Luis
04/20/2023, 9:47 AM