Mark Druffel
09/13/2024, 6:44 PMInvalid Input Error: Could not set option "schema" as a global option
.
bronze_x:
type: ibis.TableDataset
filepath: x.csv
file_format: csv
table_name: x
backend: duckdb
database: data.duckdb
schema: bronze
I can reproduce this error with vanilla ibis:
con = ibis.duckdb.connect(database="data.duckdb", schema = "bronze")
Found a related question on ibis' github, it sounds like duckdb can't set the schema globally so it has to be done in the table functions. Wondering if this would require a change to ibis.TableDataset, and if so, would this pattern work the same with other backends?Deepyaman Datta
09/13/2024, 6:56 PMWondering if this would require a change to ibis.TableDataset,Probably. If I understand correctly, this would be a request to pass
schema
(actually, database
, since schema
is deprecated as an argument to table
) as a table_arg
or something in the dataset?
and if so, would this pattern work the same with other backends?I think so, because https://github.com/ibis-project/ibis/blob/main/ibis/backends/sql/__init__.py#L47 for example (called in the
table()
function) is generic to SQL backends).Mark Druffel
09/13/2024, 7:36 PMtable(database)
is actually equivalent to catalog, database, or schema in hive.
On the ibis side, it feels like do_connect
using a database parameter is confusing. For example:
con=ibis.duckdb.connect(database = "data/db/spotify/spotify.duckdb")
con.create_database(name = "bronze")
con.create_database(name = "silver")
con.create_database(name = "gold")
con.table("x", database = "bronze")
The create_database
& table
calls use database to mean something completely different.
On the kedro-datasets side, my question becomes would it make sense to accept an argument called "schema" and just pass that to table(database = schema)
since the "database" argument is already used in the connection string for do_connect
?Deepyaman Datta
09/14/2024, 1:45 AMtable_args
(i.e. arguments that get passed to the underlying table()
call), since that's more in line with how most Kedro datasets are structured in my experience, but I haven't thought about it that much.
By the way, I've created an issue for this: https://github.com/kedro-org/kedro-plugins/issues/833 (since was having trouble sharing some of this context with Ibis team otherwise, as needed)