Galen Seilis
08/24/2023, 2:13 PMnode(
func=lambda data:data.drop_duplicates(),
inputs='some_data_set',
outputs='dup_dropped_data',
name='drop_duplicates'
)
But there are plenty of commands in Pandas that act on only a single column, like replace.
node(
func=lambda data:data.some_column.replace({'meow':'woof'}),
inputs='some_data_set'
outputs='demeowed_data',
name='remove_meowing'
)
The latter example will only return the Pandas series for some_column
. Is there a way to change a single column but return the entire dataframe?datajoely
08/24/2023, 2:18 PMliteral()
it’s been asked before, but not enough people demanded it so it looks like the issue closed
https://github.com/kedro-org/kedro/issues/526Galen Seilis
08/24/2023, 2:23 PMdatajoely
08/24/2023, 2:30 PMLodewic van Twillert
08/24/2023, 3:05 PM.replace
to the whole DataFrame at once. You can pass dictionaries to specify which values should be replaced in which columns - so you can replace values in multiple columns at the same time.
node(
func=lambda data: data.replace(to_replace={'some_column': 'meow'}, value={'some_column':'woof'}),
inputs='some_data_set'
outputs='demeowed_data',
name='remove_meowing'
)
But let's assume you still want to use methods that only apply to a single column
You can always use this syntax using pd.DataFrame.assign()
to re-assign an existing column and return the dataframe anyway, using another lambda within the .assign()
your_dataframe.assign(some_value=lambda d: do_something(d))
<- in this case d
is your entire dataframe that you are applying the .assign
to.
Use it like this in your node if you want
node(
func=lambda data: data.assign(your_column=lambda d: d.your_column.replace({'meow':'woof'})),
inputs='some_data_set'
outputs='demeowed_data',
name='remove_meowing'
)
--edit: even easier might be this, dropping the lambda within .assign()
node(
func=lambda data: data.assign(your_column=data.your_column.replace({'meow':'woof'})),
inputs='some_data_set'
outputs='demeowed_data',
name='remove_meowing'
)
Galen Seilis
08/25/2023, 3:08 AMvalue
parameter of pandas.DataFrame.replace
. I appreciate you pointing out that using it allows the entire dataframe to be returned.Iñigo Hidalgo
08/28/2023, 12:00 PM