Samuel Santos
03/04/2024, 12:51 AMdatajoely
03/04/2024, 9:24 AMmetadata
field available in the catalog you can use for any purpose and access programatically in hooks and other parts of the run lifecycledatajoely
03/04/2024, 9:24 AMDeepyaman Datta
03/04/2024, 3:02 PMmodeling
pipeline (or whatever you want to call the post-feature-engineering pipeline).
For feature engineering, how different are they? If the general process is pretty similar, you can have e.g. a encode_categorical_features
node in your pipeline, and it can accept an argument with the list of features to encode. Normal approach would be to pass that list as parameters (instead of inventing a sidecar YAML construct). For example, you may have namespaced parameters:
whatever.categorical_columns:
- col_a
- col_d
That will be used for corresponding dataset whatever.joined_data
in the modular pipeline instance with namespace whatever
.
In this approach, if a dataset doesn't have text/categorical columns, the encode_categorical_features
node will just be passed an empty list for categorical_columns
, and the logic will be robust enough to essentially perform a no-op there.Samuel Santos
03/04/2024, 7:48 PMSamuel Santos
03/04/2024, 7:52 PM