Hi All, I was wondering if there was a way to inst...
# questions
j
Hi All, I was wondering if there was a way to install only the Azure Stack dependencies of Kedro. Thinking of something along the lines of
kedro[azure]
.
m
j
yes more like exclude dependencies such as
botocore
, which seems to be AWS specific.
n
At the moment you need to do this manually. Kedro core dependency doesn’t come with
botocore
https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/setup.py
botocore
is most likely come with
S3FS
, you can look at the specific dataset that you need and add that your project. This mean that you cannot do
pip install kedro-datasets[spark.SparkDataSet]
j
ah this is what I did precisely, thanks for the info! that would be a very welcome addition in the future since it would decrease overall install times for users, decrease docker image sizes and most importantly cause less headaches if some dependencies are internally blocked due to some security vulnerability scan anyways thanks for the quick response and the great work you guys are doing on kedro!
n
I see what you mean. If you look at the Spark requirements
Copy code
spark_require = {
    "spark.SparkDataSet": [SPARK, HDFS, S3FS]
,...
}
It’s not really just Spark but also HDFS and S3FS. (who still use HDFS these days?) We can potentially separate the storage. In the past most of our users are using
s3
I guess that’s why it’s bundled. From the dependencies point of view, it is better to separate it. It does make the installation a bit longer and is a breaking change.
pip install kedro-dataset[spark.SparkDataset]
may become
pip install kedro-dataset[spark.SparkDataset, s3]
Cc @Juan Luis
In practice what will be in the
azure
extra dependecies?
j
I actually don’t know I will have to check at work and get back to you!
K 1