Fernando Cabeza
09/11/2024, 4:20 PMcheck_cols = ['a','b','c', 'd']
df = pd.read_csv(
path,
sep= ";" ,
usecols = lambda x: x in check_cols
)
The reason for solving this problem is that I have a lot of similar csv files with the same columns (using Kedro dataset factories), but some columns are missing in some files. Going back to the example imagine one of the csv's was missing column d I would still want to load the columns a, b, c.
Could you help me with this problem using YAML.API?datajoely
09/11/2024, 4:21 PMYury Fedotov
09/11/2024, 10:22 PMFernando Cabeza
09/12/2024, 7:21 AMdatajoely
09/12/2024, 7:22 AMFernando Cabeza
09/12/2024, 7:27 AMVishal Pandey
09/13/2024, 7:59 AMFernando Cabeza
09/13/2024, 9:31 AMdf_{year}:
type: pandas.CSVDataset
filepath: path_with_{year}_dependence.csv
load_args:
usecols: "${usecols_callable:}"
settings.py:
check_cols = ['a','b','c', 'd']
CONFIG_LOADER_ARGS = {
....
"custom_resolvers": {
"usecols_callable": lambda: lambda x: x in check_cols ,
},
}
That's correct, however, if you want to use ParallelRunner you will have problems with AttributeError: Can't pickle local object 'lambda.locals.lambda'Vishal Pandey
09/13/2024, 9:45 AMdatajoely
09/13/2024, 9:46 AMVishal Pandey
09/13/2024, 9:49 AMselect * from table_name where table_name.data == ${begin_date}
and we can define begin_date
in custom_resolvers .datajoely
09/13/2024, 9:50 AM