Emilio Gagliardi
07/28/2023, 4:52 AMpipeline([
node(
func=extract_rss_feed,
inputs='rss_feed_extract',
outputs='rss_feed_for_transforming',
name="extract_rss_feed",
),
node(
func=transform_rss_feed,
inputs=['rss_feed_for_transforming', 'params:rss_1'],
outputs='rss_feed_for_loading',
name="transform_rss_feed",
),
node(
func=load_rss_feed,
inputs='rss_feed_for_loading', <- incoming data (in memory)
outputs='rss_feed_load', <- calls the _save() of the class
name="load_rss_feed",
),
])
nodes.py
If all the save logic is in the class, then there's nothing for the function to do...what am I missing here? what typically goes in the function whose output is a dataset?
def load_rss_feed(preprocessed_rss_feed):
pass
When I try to run the pipeline, I get the following error:
DatasetError: Saving 'None' to a 'Dataset' is not allowed
thanks for your thoughts!Deepyaman Datta
07/28/2023, 5:49 AM_save
method, that you've commented (<- calls the _save() of the class
). So, your load_rss_feed
should do something like return preprocessed_rss_feed
, not just pass
Emilio Gagliardi
07/31/2023, 4:49 PMDeepyaman Datta
07/31/2023, 9:03 PMSo its normal to have node functions that have no content?No, it's not. It's typical to do some sort of transformation in your node (and generally advisable not to have a function that's essentially no-op).