Hi Team, I have a basic doubt about using `Partiti...
# questions
a
Hi Team, I have a basic doubt about using
PartitionedDataSet
. In the below pipeline, I have a node which returns a dictionary with values as pandas dataframes, so I define a
PartionedDataSet
catalog entry for it. If I run the nodes till only this node then the files do get saved in the correct location but the output is an empty dictionary. If I add an identity node, then the correct key-value pair is returned. Is this the desired behaviour?
d
doing a lambda here is a bit confusing if you do a
def
with a debugger it will be more intuitive. But essentially
df
in this situation is a dictionary of key : lazy loader pairs,
actual_key_value_pair_part_ds_output
needs to be a
PartitionedDataSet
too to save like this, or more logic is required to handle in a graceful way.
a
@datajoely I am not actually using the 2nd node. It is just that the first node although writes the partitioned dataset correctly, the node itself returns an empty dictionary.
Just need to confirm if this is the expected behaviour of not returning a dictionary of lazy loaders in the first node as well.
d
can I see the function
a_node_that_creates_a_part_dataset
?
a
Basically looks like this
Copy code
def a_node_that_creates_a_part_dataset(**kwargs):
    return {'key1': df1, 'key2': df2, 'key3': df3}
the outputs can have any number of keys
d
yes this doesn’t look right
I would change the second function to a
def
and check with a debugger
I would expect
{'key1': df1.load(), 'key2': df2.load(), 'key3': df3.load()}
to be passed in
👍🏼 1