Luis Chaves Rodriguez
01/14/2025, 6:12 PMHall
01/14/2025, 6:12 PMdatajoely
01/14/2025, 6:16 PMDeepyaman Datta
01/14/2025, 9:55 PMdatajoely
01/14/2025, 9:56 PMLuis Chaves Rodriguez
01/15/2025, 8:49 AMLuis Chaves Rodriguez
01/15/2025, 8:51 AMLuis Chaves Rodriguez
01/15/2025, 8:52 AMdatajoely
01/15/2025, 8:52 AMLuis Chaves Rodriguez
01/15/2025, 8:54 AMLuis Chaves Rodriguez
01/15/2025, 8:57 AMboats:
type: pandas.CSVDataset
filepath: data/01_raw/boats.csv
cars:
type: pandas.CSVDataset
filepath: data/01_raw/cars.csv
planes:
type: pandas.CSVDataset
filepath: data/01_raw/planes.csv
??datajoely
01/15/2025, 10:24 AMf-string
. In this case, the name of the input dataset factory_data
matches the pattern {name}_data
with the _data
suffix, so it resolves name
to factory
. Similarly, it resolves name
to process
for the output dataset process_data
.
This allows you to use one dataset factory pattern to replace multiple datasets entries. It keeps your catalog concise and you can generalise datasets using similar names, type or namespaces.Luis Chaves Rodriguez
01/15/2025, 12:23 PMDeepyaman Datta
01/15/2025, 2:13 PM@Deepyaman Datta to me, writing functions that are a thin wrapper around some pandas/polars operations, much more straightforward to just read the plain dataframe operations in their native languageThis is possible! For example, say you want to use https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.drop_nulls.html. Instead of defining a node, you can do
from operator import methodcaller
and use methodcaller("drop_nulls")
as your node function.Luis Chaves Rodriguez
01/16/2025, 8:33 AM