Hugo Acosta
10/11/2024, 3:53 PMload_date = settings.LOAD_DATE_COMPARISON.get("current")
previous_load_date = settings.LOAD_DATE_COMPARISON.get("previous")
def create_pipeline(**kwargs) -> Pipeline:
format_data_quality = pipeline(
[ node(
func= compare_id,
inputs=[f"maestro_indicadores_{load_date}",
f"maestro_indicadores_{previous_load_date}"],
outputs=f"compare_id_{load_date}_{previous_load_date}",
name="compare_id_node",
tags = "compare_id"
),]
)
return format_data_quality
With the corresponding catalog entry for the output:
compare_id_{load_date}_{previous_load_date}:
type: json.JSONDataset
filepath: reports/{load_date}/id_comparison/id_comparison_{load_date}_{previous_load_date}.json
The issue here is that whenever the value of load date is something like 2024_07_01, it will generate a path like:
reports/*2024*/id_comparison/id_comparison_ 2024_07_01_2024_05_01.json
Note that the first placeholder is not being substituted with the intended value, while the others are.
This will only happen when the value of load_date contains underscores, not happening with dots or hyphens.
Why does this happen?Rashida Kanchwala
10/11/2024, 4:06 PMHugo Acosta
10/11/2024, 4:21 PMNok Lam Chan
10/11/2024, 4:31 PMkedro catalog resolve
to understand this better? Is it using the pattern that you are intend to use?Nok Lam Chan
10/11/2024, 4:32 PM2024_07_07
somehow become 2024
?Hugo Acosta
10/11/2024, 4:49 PMNok Lam Chan
10/11/2024, 6:25 PMVishal Pandey
10/12/2024, 9:23 AMsettings.LOAD_DATE_COMPARISON.get("current")
What kind of object is LOAD_DATE_COMPARISON
and how it is defined in settings.pyAnkita Katiyar
10/12/2024, 10:15 AMparse
(https://pypi.org/project/parse/) library that we use under the hood for matching dataset names to patterns works this way. It’ll resolve the brackets for compare_id_{load_date}__{previous_load_date}_
at the first underscore. It’s expected behaviour and i’d recommend using a different separator between the dates for this output datasetHugo Acosta
10/14/2024, 10:53 AMHugo Acosta
10/14/2024, 10:56 AMLOAD_DATE_COMPARISON = globals_config["load_dates_comparison"]
Which refers to the globals.yml file where:
load_dates_comparison:
previous: "2024_07_01"
current: "2024_10_07"
Hugo Acosta
10/14/2024, 1:44 PMAnkita Katiyar
10/14/2024, 1:50 PMparse
library does it in a way that the first match that satisfies the pattern is returned.
So something_{2024}_{07_01_2024_10_07}
and something_{2024_07}_{01_2024_10_07}
and something_{2024_07_01}_{2024_10_07}
all satisfy the pattern but the parse
library returns the first match