Hi everyone I have been exploring ibis for someti...
# questions
v
Hi everyone I have been exploring ibis for sometime. I just wanted to understand is there a better way to write the below code in a more optimised fashion
Copy code
import ibis

con = ibis.connect(POSTGRES_CONNECTION_STRING)
training_meta_table:ir.Table = con.table("training_metadata")

filters = {
    "customer_ids" : [59] ,
    "queue_names" : ["General Lit - Misclassifications", "MoveDocs-MR"],
    "start_date" : "2024-09-5 00:00:00",
    "end_date" : "2024-09-11 00:00:00",
    "doc_types" : [],
    "fields" : ["patientFirstName", "patientLastName", "Service Date", "Doctor"]
}
field_conditions = training_meta_table.fields_present.contains(filters["fields"][0]) | training_meta_table.fields_present.contains(filters["fields"][1]) | training_meta_table.fields_present.contains(filters["fields"][2]) | training_meta_table.fields_present.contains(filters["fields"][3])
So there are many or conditions we would like to dynamically join together to create 1 final condition based on the input filters
@Nok Lam Chan @datajoely Any experience with ibis
v
@datajoely just referred it , i could not make much use of it. if you see we need to kind of iterate through all field conditions(List) present in the filters dictionary and create ibis expression for each of them.
d
A bit of playing with chatgpt, I think this is the most readable:
Copy code
import ibis
from functools import reduce
import operator

# Establish connection
con = ibis.connect(POSTGRES_CONNECTION_STRING)
training_meta = con.table("training_metadata")

# Define filters
filters = {
    "customer_ids": [59],
    "queue_names": ["General Lit - Misclassifications", "MoveDocs-MR"],
    "start_date": "2024-09-05 00:00:00",
    "end_date": "2024-09-11 00:00:00",
    "fields": ["patientFirstName", "patientLastName", "Service Date", "Doctor"]
}

# Helper function to build filter conditions
def build_filters(table, filters):
    conditions = [
        table.customer_id.isin(filters["customer_ids"]),
        table.queue_name.isin(filters["queue_names"]),
        table.date.between(filters["start_date"], filters["end_date"]),
        table.fields_present.overlaps(filters["fields"])
    ]
    return reduce(operator.and_, conditions)

# Apply filters and select fields using the helper function
result = (
    training_meta
    .filter(build_filters(training_meta, filters))
    .select(filters["fields"])
)

# Execute the query (uncomment when ready)
# fetched_results = result.execute()
👍 1
v
Thanks @datajoely will test and let you know. But I am curious to know the use of
overlaps()
here. Just a bit of context here - The fields_present column in the table contains comma separated strings , attaching some examples as well here -
Copy code
Document Date, Dr. Name, Doctor, Sender Free Text, Sender, Sender Logo, Sender Signature

Document Type, patientDOB, patientFirstName, patientLastName, providerEntity, referringPhysician, serviceDate
and we have to return the row if any of the value present in filters["fields"] is found as a substring in the fields_present column and that is why I used the code below :
training_meta_table.fields_present.contains(filters["fields"][0])
contains()
comes from ibis string expressions. But you see it is difficult to manually create such ibis expressions for the whole list of values