Hi team, After yesterdays release of `kedro-datas...
# questions
e
Hi team, After yesterdays release of
kedro-datasets==1.5.0
, our CI started failing during system tests which do a
kedro run
for a pipeline with spark (see the screenshot). As far as i can see,
SparkDataSet
is still defined with the same name as before. When we used
kedro-datasets==1.4.2
the same tests were running smoothly. I also couldn't find anything specific in the release notes - do we have to update our code (mb some import statements or how it is specified within the requirements)?
πŸ‘€ 1
d
You're not pointing to
kedro_datasets
in the apparent broken import
Can you share the catalog entry?
e
this is how it looks like in the catalog:
Copy code
_csv: &csv
  type: spark.SparkDataSet
  file_format: csv
  load_args:
    sep: ","
    header: True
    inferSchema: True
  save_args:
    header: True
    mode: overwrite

prm_observation_time_frame:
  <<: *csv
  filepath: data/03_primary/prm_observation_time_frame.csv
  layer: primary
what confused me the most was that in 1.4.2 it worked
d
Hmm...
I wonder if the lazy loading affects the path discovery in some way. I'm not totally sure why it's not looking in
kedro_datasets
.
e
Hi!
i fixed it like this in requirements:
kedro-datasets[spark-sparkdataset]~=1.5
πŸ‘ 1
happened to me yesterday, all my pipelines were broken hhaha
d
How was it working in 1.4.2 without the
spark-sparkdataset
extra I wonder?
@Elena Mironova can you make sure you have the appropriate extras installed? If not, I can try to find some time to investigate whether
__all__
is getting populated in the discovery here (I thought I did check it when implementing, but not sure if something isn't working as expected); otherwise, nothing seems like it shouldn't work in my cursory pass through...
e
image.png
πŸ‘ 1
d
Ah, yes,
spark.SparkDataSet
extra on
kedro-datasets
will do nothing. πŸ™‚
πŸ‘ 1
n
Flag https://github.com/kedro-org/kedro-plugins/pull/263, I don’t recalled we changed this behavior, likely a bug
I don’t think
kedro-datasets[spark-sparkdataset]~=1.5
is what we intended, could be a temporary fix.
e
yeah, i ended up pinning kedro-datasets to an older version that worked as well, i just was hoping for a more robust fix πŸ€“ @Deepyaman Datta, we have all the relevant extras mentioned in
setup.cfg
of the starter, exactly the same as it was before, so i'd assume that correct extras are installed (however, can't confirm 100%, cause our CI commands only list full packages through
pip freeze
)
n
@Elena Mironova Can you open an issue on https://github.com/kedro-org/kedro-plugins/issues/new/choose? Thanks for flagging this, ideally if we are gonna fix it we should fix it in the 1.5.x series
e
n
@Elena Mironova 1.5.2 is out and should have fixed this issue.
kedro-datasets[spark-sparkdataset]
This will not be supported since it was added unintentionally.
e
Thank you!! So should we just have
kedro-datasets
in requirements, without optional extras?
n
If you need spark then
pip install kedro-datasets[spark.SparkDataSet]
πŸ‘ 1