Hello team, I have the following catalog entry in ...
# questions
p
Hello team, I have the following catalog entry in my yaml file. Columns parameter below is not working. Am I missing something here? Thank you in advance!
Copy code
raw_dataset:
  type: spark.SparkDataSet
  filepath: "/data/01_raw/data.csv"
  file_format: csv
  load_args:
    header: True
    inferSchema: True
    index: False
    columns: ["a", "b", "c"]
m
I’m not a spark expert but from their docs it doesn’t look like columns takes any arguments. (https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_df.html)
@datajoely probably knows
d
yes you would do this with
df.select(["a","b","c"])
within the node, if you really wanted to do this with your catalog entry you’d need to define your own subclassed dataset
👍 2
p
oh I remember now 🙂 columns works with pandas dataset