I want to use pyspark my databricks environment, b...
# questions
m
I want to use pyspark my databricks environment, but polars locally. Any recommendation on how to manage different hooks by environment? Quickest thing I can see is to put the hook logic inside an if statement and use context.env to determine whether to run. Something like:
Copy code
class SparkHooks:
    @hook_impl
    def after_context_created(self, context) -> None:
        if context.env == "production":
            from pyspark import SparkConf
            from pyspark.sql import SparkSession

            parameters = context.config_loader["spark"]
            spark_conf = SparkConf().setAll(parameters.items())
            
            spark_session_conf = (
                SparkSession.builder.appName(context.project_path.name)
                .enableHiveSupport()
                .config(conf=spark_conf)
            )
            _spark_session = spark_session_conf.getOrCreate()
            _spark_session.sparkContext.setLogLevel("WARN")
Is there a different / preferred way to accomplish this?
d
So I think Ibis is the modern way to approach this since you can use the same query syntax on both ends https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis
💯 1
we’re working on more native support currently, but I’m pretty bullish on this being the pattern of the future where one can swap out back-ends at the drop of a hat
m
Agreed, we're trying to do this with ibis. Might not be a great plan since native support isn't in place yet, but we were giving it a shot 🤷‍♂️ We are using ibis.TableDataset w/ polars for local development and are trying to see if we can use the same approach to create an ibis.ManagedTableDataset for the databricks environment
j
to your original question @Mark Druffel, I don't think we have a way to specify hooks per environment (because there's no per-environment
settings.py
) so your solution seems like a decent workaround
gratitude thank you 1
d
That’s awesome - please let us know how it goes with Ibis as we’d appreciate help getting databricks support in our forthcoming implementation
💯 1
m
Nvm I'm an idiot and missed an error, disregard
^^see
^^see
d
💪 best kind of user support, glad it got fixed