Hi everyone I have been exploring ibis for sometime I just w Kedro #questions

Hi everyone I have been exploring ibis for someti...

Vishal Pandey

10/15/2024, 8:57 AM

Hi everyone I have been exploring ibis for sometime. I just wanted to understand is there a better way to write the below code in a more optimised fashion

Copy code

import ibis

con = ibis.connect(POSTGRES_CONNECTION_STRING)
training_meta_table:ir.Table = con.table("training_metadata")

filters = {
    "customer_ids" : [59] ,
    "queue_names" : ["General Lit - Misclassifications", "MoveDocs-MR"],
    "start_date" : "2024-09-5 00:00:00",
    "end_date" : "2024-09-11 00:00:00",
    "doc_types" : [],
    "fields" : ["patientFirstName", "patientLastName", "Service Date", "Doctor"]
}
field_conditions = training_meta_table.fields_present.contains(filters["fields"][0]) | training_meta_table.fields_present.contains(filters["fields"][1]) | training_meta_table.fields_present.contains(filters["fields"][2]) | training_meta_table.fields_present.contains(filters["fields"][3])

So there are many or conditions we would like to dynamically join together to create 1 final condition based on the input filters

Vishal Pandey

10/15/2024, 8:58 AM

@Nok Lam Chan @datajoely Any experience with ibis

datajoely

10/15/2024, 9:01 AM

https://ibis-project.org/how-to/analytics/chain_expressions.html

Vishal Pandey

10/15/2024, 9:05 AM

@datajoely just referred it , i could not make much use of it. if you see we need to kind of iterate through all field conditions(List) present in the filters dictionary and create ibis expression for each of them.

datajoely

10/15/2024, 10:05 AM

A bit of playing with chatgpt, I think this is the most readable:

Copy code

import ibis
from functools import reduce
import operator

# Establish connection
con = ibis.connect(POSTGRES_CONNECTION_STRING)
training_meta = con.table("training_metadata")

# Define filters
filters = {
    "customer_ids": [59],
    "queue_names": ["General Lit - Misclassifications", "MoveDocs-MR"],
    "start_date": "2024-09-05 00:00:00",
    "end_date": "2024-09-11 00:00:00",
    "fields": ["patientFirstName", "patientLastName", "Service Date", "Doctor"]
}

# Helper function to build filter conditions
def build_filters(table, filters):
    conditions = [
        table.customer_id.isin(filters["customer_ids"]),
        table.queue_name.isin(filters["queue_names"]),
        table.date.between(filters["start_date"], filters["end_date"]),
        table.fields_present.overlaps(filters["fields"])
    ]
    return reduce(operator.and_, conditions)

# Apply filters and select fields using the helper function
result = (
    training_meta
    .filter(build_filters(training_meta, filters))
    .select(filters["fields"])
)

# Execute the query (uncomment when ready)
# fetched_results = result.execute()

👍 1

Vishal Pandey

10/15/2024, 10:12 AM

Thanks @datajoely will test and let you know. But I am curious to know the use of

overlaps()

here. Just a bit of context here - The fields_present column in the table contains comma separated strings , attaching some examples as well here -

Copy code

Document Date, Dr. Name, Doctor, Sender Free Text, Sender, Sender Logo, Sender Signature

Document Type, patientDOB, patientFirstName, patientLastName, providerEntity, referringPhysician, serviceDate

and we have to return the row if any of the value present in filters["fields"] is found as a substring in the fields_present column and that is why I used the code below :

training_meta_table.fields_present.contains(filters["fields"][0])

contains()

comes from ibis string expressions. But you see it is difficult to manually create such ibis expressions for the whole list of values

3 Views

Open in Slack

Previous Next