Hello there! I am trying to install kedro_datasets...
# questions
g
Hello there! I am trying to install kedro_datasets in order to use the pandas.ExcelDataset class to automate an xlsx loading process (not interested in the rest of the pipeline atm). However, when i do
pip install kedro_datasets
a) i see kedro being installed as a dependency, and, b) i get an OSError saying "no directory: /bin/pygrun" (a dependency of antlr4-python3-runtime, which is a package for text processing) >> Is it possible to restrict pip installation just to kedro_datasets, and (even stricter) just to a certain type (e.g. pandas or pandas.ExcelDataset) ? (Googling didn't help with either a or b)
Running first
pip install omegaconf
and following with
pip instal kedro_datasets
helped resolve the installation error. Still though
kedro
is being installed (is that expected?)
m
General thoughts: It's best to have something like requirements.txt or pyproject.toml with all requirements specified in one place and then use tools like `pip-tools`(
pip-compile
) or
Poetry
or
uv
to resolve dependencies and versions - it will allow you to handle conflicts. Installing packages one-by-one is a bad practice. --- To address your specific question -
pip install --no-deps <your package>
should install only
<your package>
. Having in mind what I wrote above, I'm not recommending that path.
To install kedro_datasets with excel support you should go with:
Copy code
pip install "kedro_datasets[pandas-exceldataset]"
👍 1
j
and to clarify: @George p yes,
kedro
is a dependency of
kedro-datasets
. there was some discussion about this exact topic https://github.com/kedro-org/kedro/issues/2409 and we've been collecting some evidence from users that want to install the Kedro Catalog without the rest of Kedro in https://github.com/kedro-org/kedro/issues/2741 may I ask, what motivates you to install
kedro-datasets
without
kedro
?
👍 1
g
Belated reply, but thank you both for the tips and for the links! Interesting to read some of the historical conversations concerning choices when developing kedro.
While reading the discussion on the github issue, i find myself resonating with the point: "wanted to share with colleagues the Dataset abstraction, without requiring the entire kedro installation/project". Having spent a few days on my problem though, i see how my next steps are essentially converging to a pipeline which kedro itself can help out with. :)
❤️ 1