Hi team, do we support customized input from param...
# questions
z
Hi team, do we support customized input from parameters into the SQL query/file? I tried the following the codes file but Kedro couldn't read the parameters and input into sql query.
Copy code
#globals.yml
order_number: 'abc'
Copy code
#catalog.yml
sql:
  type: pandas.SQLQueryDataset
  sql: "SELECT * FROM table WHERE column = ${order_number}"
n
https://github.com/kedro-org/kedro/discussions/1782 Are you on 0.18.x? You may need a custom config loader.
đź‘Ť 1
But please be aware of the risk of SQL injection
z
It works!! Amazing! Thanks for the help! i will pay attention to the risks. Thanks for reminding
one more question, can i pass yml files not in
conf/base/
but some other subfolders e.g.
conf/base/parameters
and if I put the SQL query in a seperate
.sql
file and pass in the
filepath
instead of sql queries directly in the catalog.yml. Is there any way to pass the parameters as well?
n
Sure! By default it recognise this pattern, it’s also customisable since
0.18.5
Copy code
"parameters": ["params*", "params*/**", "**/params*"]
It’s also how modular pipeline works by default
conf/base/parameters/<pipeline_name>.yml
You can create the pipeline structure by doing
kedro pipeline create <pipeline_name>
If you used the
filepath
option, there aren’t native way of injecting runtime parameters.
đź‘Ť 1
z
I tried
Copy code
CONFIG_LOADER_ARGS = {
    "globals_pattern": "*pipeline_XXX.yml",
}
But it seems that it only looks for yml file in conf/base or conf/local directory? I put this yml file in conf/base/parameters
n
You don’t need to pass in any custom argument in this case.
Copy code
"parameters": ["params*", "params*/**", "**/params*"]
This is the default pattern
đź‘Ť 1
z
That works!! Thanks so much for the support. BTW if i do want to inject runtime parameters into my
.sql
file and pass as
filepath
. Is there anyway i could do that? Should I use Jinja or something
K 1
n
Jinja should be fine, but this may be quite advance and not so straightforward. I think the easiest way is to overload the
SQLQueryDataSet
, and pass in the “templated variable” into the dataset - compile the sql query in the
_load
method
z
if I compile the sql query in the
_load
method, can I still put the query in a seperate
.sql
file? The context is that we want to put all sql queries in a separate folder to better organize.
n
With the current implementation, there are two way you can pass in the SQL 1. using the
sql
field, which works with parameters already 2. use
filepath
which load the sql file directly and execute the query
The idea of this CustomSQLDataSet is you add addition logic for (2), after loading the file, you further compile it with Jinja + the template variable -> then you execute the compile version of it
z
that's to say if I want to go with option 2, I must use Jinja to further compile the query?
n
Copy code
def _load(self) -> pd.DataFrame:
        load_args = copy.deepcopy(self._load_args)
        engine = self.engines[self._connection_str].execution_options(
            **self._execution_options
        )  # type: ignore

        if self._filepath:
            load_path = get_filepath_str(PurePosixPath(self._filepath), self._protocol)
            with self._fs.open(load_path, mode="r") as fs_file:
                load_args["sql"] = fs_file.read()

  ## Add your Jinja stuff here

        return pd.read_sql_query(con=engine, **load_args)
z
oh I see!! Let me try to see how Jinja could help with option 2
n
It’s just what’s top of my head. You can also use the normal SQL bind variable, which avoid the SQL injection problem.
z
But we can't have bind variables defined from the
.yml
file? Really appreciate your time and efforts in helping me with this problem! Thanks a lot