Juan Luis
08/30/2023, 1:58 PMbigframes
, DataFrame APIs for BigQuery 🔥 https://cloud.google.com/python/docs/reference/bigframes/latest, https://pypi.org/project/bigframes/datajoely
08/30/2023, 2:04 PMJuan Luis
08/30/2023, 2:07 PMCody Peterson
08/30/2023, 2:12 PMDeepyaman Datta
08/30/2023, 2:39 PMBlock
concept, but looking more into it, seems like it's just their version of NDFrame
.
might be adaptable more generally too?I think so. Their
Session
pretty much creates your ibis.Backend
instance, and there's not much backend-specific until you get to `to_pandas()`and you use that Session
in executing the Ibis expression. https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/session.py#L299-L306 and then places like https://github.com/googleapis/python-bigquery-dataframes/blob/bf6ecb81afeb199b3dad07d1fd2057668352f939/bigframes/core/scalar.py#L57
Bits and pieces where there are specific BigQuery-related limitations, but look pretty easy to pick out in making it more generic.datajoely
08/30/2023, 3:00 PMJuan Luis
08/30/2023, 3:01 PMdatajoely
08/30/2023, 3:01 PMCody Peterson
08/30/2023, 3:07 PMdatajoely
08/30/2023, 3:13 PMDeepyaman Datta
08/30/2023, 3:26 PMwe're moving toward a world where each cloud data platform has their own similar, but slightly different dataframe implementation (PySpark dataframes or pandas on PySpark for Databricks/Synapse, Snowpark for Snowflake, Bigframes for BigQuery, etc.)Snowpark is a bit more different, in that it isn't a pandas API really (as far as I know).
pyspark.pandas
and BigFrames both try to stay as close to the pandas API as possible. pyspark.pandas
whole test suite is checking equivalence to pandas operations/syntax; BigFrames vendors pandas itself so it doesn't need to rewrite docstrings 😂 (pyspark.pandas
is not as fancy about it, but most of the docstrings are copied word-for-word for the most part)
But, if more vendors/projects down the line want that pandas-equivalent API layer, I can definitely see them using this + Ibis, since it's a lot of extra work/maintenance that most teams would love to avoid. So I think/hope we'd see not-so-different dataframe implementations in the end 🙂Cody Peterson
08/30/2023, 3:30 PM