Juan Luis
08/30/2023, 1:58 PMbigframes, DataFrame APIs for BigQuery 🔥 https://cloud.google.com/python/docs/reference/bigframes/latest, https://pypi.org/project/bigframes/datajoely
08/30/2023, 2:04 PMJuan Luis
08/30/2023, 2:07 PMCody Peterson
08/30/2023, 2:12 PMDeepyaman Datta
08/30/2023, 2:39 PMBlock concept, but looking more into it, seems like it's just their version of NDFrame.
might be adaptable more generally too?I think so. Their
Session pretty much creates your ibis.Backend instance, and there's not much backend-specific until you get to `to_pandas()`and you use that Session in executing the Ibis expression. https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/session.py#L299-L306 and then places like https://github.com/googleapis/python-bigquery-dataframes/blob/bf6ecb81afeb199b3dad07d1fd2057668352f939/bigframes/core/scalar.py#L57
Bits and pieces where there are specific BigQuery-related limitations, but look pretty easy to pick out in making it more generic.Deepyaman Datta
08/30/2023, 2:45 PMdatajoely
08/30/2023, 3:00 PMJuan Luis
08/30/2023, 3:01 PMdatajoely
08/30/2023, 3:01 PMdatajoely
08/30/2023, 3:01 PMdatajoely
08/30/2023, 3:01 PMCody Peterson
08/30/2023, 3:07 PMCody Peterson
08/30/2023, 3:08 PMdatajoely
08/30/2023, 3:13 PMDeepyaman Datta
08/30/2023, 3:26 PMwe're moving toward a world where each cloud data platform has their own similar, but slightly different dataframe implementation (PySpark dataframes or pandas on PySpark for Databricks/Synapse, Snowpark for Snowflake, Bigframes for BigQuery, etc.)Snowpark is a bit more different, in that it isn't a pandas API really (as far as I know).
pyspark.pandas and BigFrames both try to stay as close to the pandas API as possible. pyspark.pandas whole test suite is checking equivalence to pandas operations/syntax; BigFrames vendors pandas itself so it doesn't need to rewrite docstrings 😂 (pyspark.pandas is not as fancy about it, but most of the docstrings are copied word-for-word for the most part)
But, if more vendors/projects down the line want that pandas-equivalent API layer, I can definitely see them using this + Ibis, since it's a lot of extra work/maintenance that most teams would love to avoid. So I think/hope we'd see not-so-different dataframe implementations in the end 🙂Cody Peterson
08/30/2023, 3:30 PMCody Peterson
08/30/2023, 3:30 PMCody Peterson
08/30/2023, 3:31 PM