tingting wan
01/04/2023, 4:25 PMDeepyaman Datta
01/04/2023, 4:56 PMfilepath
is in (most) data catalog entries, including for CSV datasets?Jannic Holzer
01/04/2023, 5:18 PMtingting wan
01/04/2023, 5:19 PMdbfs:/mnt/weather/forecast/forecast_<date>
having similar schema (not exact same, some file are missing 1 field for example), I am interested in <date>
part as I can do filtering within each file, therefore I would like to have a column containing the source path for each rowDeepyaman Datta
01/04/2023, 5:24 PMdf.withColumn('input_file', input_file_name())
in your node.Michał Madej
01/04/2023, 5:36 PMtingting wan
01/04/2023, 6:10 PMdf.withColumn('input_file', input_file_name())
works, but how can I apply the similar logic in Kedro?spark.SparkDataSet
, and args
load_args:
header: False
recursiveFileLookup: True
it works by passing
filepath: dbfs:/mnt/weather/forecast/*
I want to add a column having exact source path name, e.g., dbfs:/mnt/weather/forecast/forecast_2022-1-2
,dbfs:/mnt/weather/forecast/forecast_2022-1-3, dbfs:/mnt/weather/forecast/forecast_2022-1-4
etc.,Deepyaman Datta
01/04/2023, 7:17 PMtingting wan
01/06/2023, 9:36 AM