Eduardo Romero López
07/09/2023, 11:32 AMJuan Luis
07/09/2023, 11:34 AMreturn
the resultpipeline([
node(
func=intermediate_data,
inputs=["raw_dataframe"],
output="intermediate_dataframe"
)
])
where "raw_dataframe" and "intermediate_dataframe" are defined in conf/base/catalog.yml
Eduardo Romero López
07/09/2023, 11:40 AMJuan Luis
07/09/2023, 11:45 AMreturn df
and then declare the dataset in catalog.yml
as
intermediate_df:
type: pandas.ParquetDataset
filepath: data/02_intermediate/preprocessed_queries.pq
(pseudocode, didn't test it but you get the idea)Eduardo Romero López
07/09/2023, 11:48 AMNok Lam Chan
07/09/2023, 1:46 PMEduardo Romero López
07/09/2023, 4:25 PM