Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hello team,
I have the following catalog entry in my yaml file.
Columns parameter below is not working. Am I missing something here? Thank you in advance!
```raw_dataset:
  type: spark.SparkDataSet
  filepath: "/data/01_raw/data.csv"
  file_format: csv
  load_args:
    header: True
    inferSchema: True
    index: False
    columns: ["a", "b", "c"]```

I’m not a spark expert but from their docs it doesn’t look like columns takes any arguments. (<https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_df.html>)

yes you would do this with `df.select(["a","b","c"])`

within the node, if you really wanted to do this with your catalog entry you’d need to define your own subclassed dataset

oh I remember now :slightly_smiling_face:
columns works with pandas dataset