https://kedro.org/ logo
#questions
Title
# questions
i

Iñigo Hidalgo

01/19/2023, 9:17 AM
Hey all, simple question: is it possible to pass both positional arguments as well as keyword arguments to a kedro node? My example usecase is the sklearn train_test_split function, which takes an arbitrary number of arrays passed positionally and then keyword arguments like
test_size
need to be passed by name. It would need to be a combination of passing an iterable as well as a dictionary to the
inputs
for the node, which as far as I know isn't doable. If not possible, how would you suggest I proceed, when my objective is to be able to feed in outputs from different nodes to converge into that function to then output into a train node.
I'm trying to think how I could massage the input dataframes to be able to pass them into that node. The function is actually my own code so I could modify its signature to for example accept the dataframes as a tuple kwarg instead of an arbitrary number of positional arguments, but I can't think of a way to combine these different dataframes coming from different nodes into a tuple without adding an extra node to collect them, which I would like to avoid if possible.
d

datajoely

01/19/2023, 9:45 AM
So I actually think to do this you may need to get creative
In the same way that you can use
functools.partial
to pass a literal python value to a node input you could pre-prepare your arguments that sort of way https://stackoverflow.com/a/58875821/2010808
it’s not pretty but it would work
b

Ben Horsburgh

01/19/2023, 9:47 AM
You can achieve it with a wrapt decorator, but that is starting to get quite creative
i

Iñigo Hidalgo

01/19/2023, 9:48 AM
That would allow me to pass the kwarg ahead of time as a hardcoded literal, but if possible I would like the kwarg to be defined thru config, as I am trying to build a modular pipeline which will be used in different scenarios.
@Ben Horsburgh I'm happy to get creative and decorators are always fun, do you have a suggestion on how you would approach this?
But tbh I'm leaning towards "it can't be done" and just going with the extra node to collect
d

datajoely

01/19/2023, 9:56 AM
This is the library he’s referring to, I’m not familiar with it https://wrapt.readthedocs.io/en/latest/decorators.html
there is a theme here that at some level you need to introduce a proxy somehow
K 1
b

Ben Horsburgh

01/19/2023, 10:31 AM
I have a snippet that does something very similar, will modify and post soon
❤️ 1
So, MASSIVE CAVEAT = I think that the suggestion you make @Iñigo Hidalgo to use a function to prepare the data is possibly better, because it is more explicit (more zen-like). However, if you reeeeeaaally want to, and also because it is fun, the following decorator shows how you can do stuff like this:
Copy code
import inspect
from abc import ABC, abstractmethod
from typing import Tuple, Dict, Callable, Optional, List

import wrapt

class PositionalWithKwargs:
    def __init__(self, convert_to_positional: List[str]):
        self.convert_to_positional = convert_to_positional

    def rewrite(self, *args, **kwargs) -> Tuple[Tuple, Dict]:
        # extract positional from kwargs
        positional = [kwargs[k] for k in self.convert_to_positional]

        # remove positional from kwargs
        kwargs = {
            k: v for k, v in kwargs.items() if k not in self.convert_to_positional
        }

        # prepend args with positional
        args = [*positional, *args]

        return tuple(args), kwargs

    def argspec_factory(self, wrapped) -> Optional[inspect.FullArgSpec]:
        # 'args' is a list of the parameter names.
        # 'varargs' and 'varkw' are the names of the * and ** parameters or None.
        # 'defaults' is an n-tuple of the default values of the last n parameters.
        # 'kwonlyargs' is a list of keyword-only parameter names.
        # 'kwonlydefaults' is a dictionary mapping names from kwonlyargs to defaults.
        # 'annotations' is a dictionary mapping parameter names to annotations.
        (
            args,
            varargs,
            varkw,
            defaults,
            kwonlyargs,
            kwonlydefaults,
            annotations,
        ) = inspect.getfullargspec(wrapped)

        # grab arg defaults
        defaults = defaults or ()
        arg_defaults = {
            arg: default
            for arg, default in (
                reversed(list(zip(reversed(args), reversed(defaults))))
            )
        }

        # add positional args so they are expected as kwargs
        kwonlyargs = kwonlyargs or []
        kwonlydefaults = kwonlydefaults or {}
        kwonlyargs.extend(self.convert_to_positional)
        kwonlydefaults.update(
            **{
                arg: default
                for arg, default in arg_defaults.items()
                if arg in self.convert_to_positional
            }
        )

        # remove positional args we now expect as kwargs
        defaults = tuple(
            default
            for arg, default in (
                reversed(list(zip(reversed(args), reversed(defaults))))
            )
            if arg not in self.convert_to_positional
        )
        args = [arg for arg in args if arg not in self.convert_to_positional]

        return inspect.FullArgSpec(
            args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults, annotations
        )

    def __call__(self, func):
        @wrapt.decorator(adapter=self.argspec_factory(func))
        def __call__(wrapped, instance, args, kwargs):

            args, kwargs = self.rewrite(*args, **kwargs)
            return wrapped(*args, **kwargs)

        return __call__(func)
Example usage:
Copy code
def my_function(a, b, c, /, x):
    return a + b + c + x

print(my_function(1, 2, 3, 4))
# >>> 10

wrapped_func = PositionalWithKwargs(["a", "b"])(my_function)
print(wrapped_func(1, a=2, b=3, x=4))
# >>> 10
There are a couple of important bits of doing it this way: 1. You need to modify
fullargspec
- kedro uses this to map parameters correct, and users will use it anytime they read a stack trace or rely on IDE prompts 2. You need to intercept and rewrirte the
args
and
kwargs
sent to the call 3. You should use
wrapt
as opposed to native python wrappers because they do a ton of amazing magic that preserved docstrings and the likes
i

Iñigo Hidalgo

01/19/2023, 11:20 AM
Wow @Ben Horsburgh thank you so much for this. It's going to take me a while to parse the entirety of this snippet but I can see where you're going with it and it seems super interesting. It would make it easier to make already-exisitng code adapt to the kedro structure.
🥳 1
For a one-off I would definitely just use my collecting node, but it's actually something I've wanted to do on multiple occasions so I will probs put in the work to include your code in my pipelines. Thank you so much again!!
d

datajoely

01/19/2023, 11:27 AM
yeah in short - this isn’t easily possible out of the box, but it’s part of a wider problem some people are thinking about
(ben didn’t write all of that snippet in 2 mins 😛)
🤣 1
b

Ben Horsburgh

01/19/2023, 11:41 AM
Haha yeah it's from a collection of snippets I've been messing around with for a while, but can't ever seem to make work cleanly. I always come back to - just do an explicit proxy function
17 Views