Hello! I was wondering what are the good practices...
# questions
f
Hello! I was wondering what are the good practices about the Python classes in Kedro. For instance, in one of my nodes (define_text_processor) defined in nodes.py, I initialize an instance of a class (TextProcessor):
Copy code
def define_text_processor():
    transformations = [LowerText(), RemoveUniCode(), StopWordsRemoval(), Stemming()]
    text_processor = TextProcessor(transformations)

    return text_processor
Where should I put the code for the TextProcessor class definition?
Copy code
class TextProcessor:
    def __init__(self, transformations=None):
        if transformations is None:
            transformations = list()
        self.transformations = transformations

    def process_data(self, data: str):
        for transformation in self.transformations:
            data = transformation.process(data)
        return data

    def __repr__(self):
        return f'TextProcessor object with the following transformations: {[o.name for o in self.transformations]}'
In nodes.py (even though TextProcessor is not a node)? Or should it be in the catalog? And isn't it an issue that we won't see the code of this class in Kedro viz? Thanks in advance!
1
m
Hi @Francis Duval, that's a good question and might come down to personal preference. I would create a separate python file
text_processor.py
and just import it in
nodes.py
Since it's not Node or Pipeline code it makes sense to me to keep it separate. If you don't have many nodes and
TextProcessor
is your only class, you could also decide to just keep it all in
nodes.py
. I would not keep it in the catalog. First of all, that's a
.yml
file so would make things complicated. Second of all, the catalog is really meant to keep all your input/output datasets used in your pipeline and this class isn't a dataset. With respect to Kedro Viz, it won't be an issue for the tool itself that you can't see the code, but the question is of course what you as a user want or expect. In Viz you don't see all code of your Kedro project, just the Nodes, and since
TextProcessor
isn't a node it wouldn't show, which to me personally makes sense. Hope this helps 🙂
f
Thank you so much Merel, it clarifies my questions a lot!