Francis Duval
01/09/2024, 1:36 PMdef define_text_processor():
transformations = [LowerText(), RemoveUniCode(), StopWordsRemoval(), Stemming()]
text_processor = TextProcessor(transformations)
return text_processor
Where should I put the code for the TextProcessor class definition?
class TextProcessor:
def __init__(self, transformations=None):
if transformations is None:
transformations = list()
self.transformations = transformations
def process_data(self, data: str):
for transformation in self.transformations:
data = transformation.process(data)
return data
def __repr__(self):
return f'TextProcessor object with the following transformations: {[o.name for o in self.transformations]}'
In nodes.py (even though TextProcessor is not a node)? Or should it be in the catalog? And isn't it an issue that we won't see the code of this class in Kedro viz?
Thanks in advance!Merel
01/09/2024, 5:20 PMtext_processor.py
and just import it in nodes.py
Since it's not Node or Pipeline code it makes sense to me to keep it separate. If you don't have many nodes and TextProcessor
is your only class, you could also decide to just keep it all in nodes.py
.
I would not keep it in the catalog. First of all, that's a .yml
file so would make things complicated. Second of all, the catalog is really meant to keep all your input/output datasets used in your pipeline and this class isn't a dataset.
With respect to Kedro Viz, it won't be an issue for the tool itself that you can't see the code, but the question is of course what you as a user want or expect. In Viz you don't see all code of your Kedro project, just the Nodes, and since TextProcessor
isn't a node it wouldn't show, which to me personally makes sense.
Hope this helps 🙂Francis Duval
01/09/2024, 6:00 PM