Has anyone used ibis.TableDataset with duckdb sche...
# plugins-integrations
m
Has anyone used ibis.TableDataset with duckdb schemas? If I set a schema on a data catalog entry I get the error
Invalid Input Error: Could not set option "schema" as a global option
.
Copy code
bronze_x:
  type: ibis.TableDataset
  filepath: x.csv
  file_format: csv
  table_name: x
  backend: duckdb
  database: data.duckdb
  schema: bronze
I can reproduce this error with vanilla ibis:
Copy code
con = ibis.duckdb.connect(database="data.duckdb", schema = "bronze")
Found a related question on ibis' github, it sounds like duckdb can't set the schema globally so it has to be done in the table functions. Wondering if this would require a change to ibis.TableDataset, and if so, would this pattern work the same with other backends?
d
Wondering if this would require a change to ibis.TableDataset,
Probably. If I understand correctly, this would be a request to pass
schema
(actually,
database
, since
schema
is deprecated as an argument to
table
) as a
table_arg
or something in the dataset?
and if so, would this pattern work the same with other backends?
I think so, because https://github.com/ibis-project/ibis/blob/main/ibis/backends/sql/__init__.py#L47 for example (called in the
table()
function) is generic to SQL backends).
m
Yea I was just looking through some of the other backends, agree. I'm trying to check pyspark now to recall if
table(database)
is actually equivalent to catalog, database, or schema in hive. On the ibis side, it feels like
do_connect
using a database parameter is confusing. For example:
Copy code
con=ibis.duckdb.connect(database = "data/db/spotify/spotify.duckdb")
con.create_database(name = "bronze")
con.create_database(name = "silver")
con.create_database(name = "gold")
con.table("x", database = "bronze")
The
create_database
&
table
calls use database to mean something completely different. On the kedro-datasets side, my question becomes would it make sense to accept an argument called "schema" and just pass that to
table(database = schema)
since the "database" argument is already used in the connection string for
do_connect
?
d
Sorry, I missed your reply. In short: I'd be inclined to match the Ibis name, but create a section for
table_args
(i.e. arguments that get passed to the underlying
table()
call), since that's more in line with how most Kedro datasets are structured in my experience, but I haven't thought about it that much. By the way, I've created an issue for this: https://github.com/kedro-org/kedro-plugins/issues/833 (since was having trouble sharing some of this context with Ibis team otherwise, as needed)
❤️ 1