Hi team do we support customized input from parameters into Kedro #questions

Hi team, do we support customized input from param...

Ziren Lin

03/28/2023, 3:44 PM

Hi team, do we support customized input from parameters into the SQL query/file? I tried the following the codes file but Kedro couldn't read the parameters and input into sql query.

Copy code

#globals.yml
order_number: 'abc'

Copy code

#catalog.yml
sql:
  type: pandas.SQLQueryDataset
  sql: "SELECT * FROM table WHERE column = ${order_number}"

Nok Lam Chan

03/28/2023, 3:57 PM

https://github.com/kedro-org/kedro/discussions/1782 Are you on 0.18.x? You may need a custom config loader.

👍 1

Nok Lam Chan

03/28/2023, 3:58 PM

But please be aware of the risk of SQL injection

Ziren Lin

03/28/2023, 4:03 PM

It works!! Amazing! Thanks for the help! i will pay attention to the risks. Thanks for reminding

Ziren Lin

03/28/2023, 4:06 PM

one more question, can i pass yml files not in

conf/base/

but some other subfolders e.g.

conf/base/parameters

Ziren Lin

03/28/2023, 4:09 PM

and if I put the SQL query in a seperate

.sql

file and pass in the

filepath

instead of sql queries directly in the catalog.yml. Is there any way to pass the parameters as well?

Nok Lam Chan

03/28/2023, 4:09 PM

Sure! By default it recognise this pattern, it’s also customisable since

0.18.5

Copy code

"parameters": ["params*", "params*/**", "**/params*"]

Nok Lam Chan

03/28/2023, 4:10 PM

It’s also how modular pipeline works by default

conf/base/parameters/<pipeline_name>.yml

Nok Lam Chan

03/28/2023, 4:11 PM

You can create the pipeline structure by doing

kedro pipeline create <pipeline_name>

Nok Lam Chan

03/28/2023, 4:12 PM

If you used the

filepath

option, there aren’t native way of injecting runtime parameters.

👍 1

Ziren Lin

03/28/2023, 4:14 PM

I tried

Copy code

CONFIG_LOADER_ARGS = {
    "globals_pattern": "*pipeline_XXX.yml",
}

But it seems that it only looks for yml file in conf/base or conf/local directory? I put this yml file in conf/base/parameters

Nok Lam Chan

03/28/2023, 4:39 PM

You don’t need to pass in any custom argument in this case.

Copy code

"parameters": ["params*", "params*/**", "**/params*"]

This is the default pattern

👍 1

Nok Lam Chan

03/28/2023, 4:41 PM

https://docs.kedro.org/en/stable/kedro_project_setup/configuration.html#template-configuration I suggest have a look at the docs

🙌 1

Ziren Lin

03/28/2023, 4:42 PM

That works!! Thanks so much for the support. BTW if i do want to inject runtime parameters into my

.sql

file and pass as

filepath

. Is there anyway i could do that? Should I use Jinja or something

K 1

Nok Lam Chan

03/28/2023, 4:47 PM

Jinja should be fine, but this may be quite advance and not so straightforward. I think the easiest way is to overload the

SQLQueryDataSet

, and pass in the “templated variable” into the dataset - compile the sql query in the

_load

method

Ziren Lin

03/28/2023, 4:51 PM

if I compile the sql query in the

_load

method, can I still put the query in a seperate

.sql

file? The context is that we want to put all sql queries in a separate folder to better organize.

Nok Lam Chan

03/28/2023, 4:53 PM

With the current implementation, there are two way you can pass in the SQL 1. using the

sql

field, which works with parameters already 2. use

filepath

which load the sql file directly and execute the query

Nok Lam Chan

03/28/2023, 4:54 PM

The idea of this CustomSQLDataSet is you add addition logic for (2), after loading the file, you further compile it with Jinja + the template variable -> then you execute the compile version of it

Ziren Lin

03/28/2023, 4:55 PM

that's to say if I want to go with option 2, I must use Jinja to further compile the query?

Nok Lam Chan

03/28/2023, 4:55 PM

Copy code

def _load(self) -> pd.DataFrame:
        load_args = copy.deepcopy(self._load_args)
        engine = self.engines[self._connection_str].execution_options(
            **self._execution_options
        )  # type: ignore

        if self._filepath:
            load_path = get_filepath_str(PurePosixPath(self._filepath), self._protocol)
            with self._fs.open(load_path, mode="r") as fs_file:
                load_args["sql"] = fs_file.read()

  ## Add your Jinja stuff here

        return pd.read_sql_query(con=engine, **load_args)

Ziren Lin

03/28/2023, 4:56 PM

oh I see!! Let me try to see how Jinja could help with option 2

Nok Lam Chan

03/28/2023, 5:00 PM

It’s just what’s top of my head. You can also use the normal SQL bind variable, which avoid the SQL injection problem.

Ziren Lin

03/28/2023, 5:04 PM

But we can't have bind variables defined from the

.yml

file? Really appreciate your time and efforts in helping me with this problem! Thanks a lot

17 Views

Open in Slack

Previous Next