Hello! Time goes by and I have more and more warn...
# questions
f
Hello! Time goes by and I have more and more warnings when running
kedro run
and `kedro viz run`:
Starting Kedro Viz ...
Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.
Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/24 09:38:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/01/24 09:38:29 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
The more warnings I have, the longer it takes to run the commands. It nows takes more than one minute for kedro viz to render.🫠I was thus wondering if any of you had already encountered one of these and if yes, how you proceeded to fix them. Edit: I updated Hadoop to 3.3.6 (from 3.3.0), and I have only one warning left:
Starting Kedro Viz ...
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/24 11:39:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
n
Spark shouldn't even be initialised when you start viz
f
Do you think that an old version of Java (1.8.0_101) could slow down the rendering of the commands?
n
I don't think Spark/Java should even be initialised
As you are just starting viz, there shouldn't be any data load from Spark?
f
I don't even use spark in my code (Spark is not installed in my conda environment).
So I would say no
n
Do you have idea where is that log coming from? it's not kedro
f
It looks like it comes from Spark 🤔
n
But you mentioned you are not using Spark at all? can you check your
catalog.yml
to see if there are any spark things or external package that may bring some attention?
f
Oh yes I have this in spark.yml:
Copy code
# You can define spark specific configuration here.

spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR
I think it's because I installed kedro-mlflow, and finally didn't use it.
n
Can you try
kedro viz run --ignore-plugins
, I think this option only available recently, I am using the latest myself 7.1.0.
f
Yes it works! And I don't have the Spark warning anymore. So I guess I need to uninstall kedro-mlflow. Now it takes about 30 seconds for kedro viz to render.
n
yep, if you don't need it just uninstall it. Optionally you can disable it in
settings.py
if you are sharing same conda env for different project
f
Perfect, thanks a lot!
I also found this in src/project_name/hooks.py:
Copy code
from kedro.framework.hooks import hook_impl
from pyspark import SparkConf
from pyspark.sql import SparkSession


class SparkHooks:
    @hook_impl
    def after_context_created(self, context) -> None:
        """Initialises a SparkSession using the config
        defined in project's conf folder.
        """

        # Load the spark configuration in spark.yaml using the config loader
        parameters = context.config_loader["spark"]
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
            SparkSession.builder.appName(context.project_path.name)
            .enableHiveSupport()
            .config(conf=spark_conf)
        )
        _spark_session = spark_session_conf.getOrCreate()
        _spark_session.sparkContext.setLogLevel("WARN")
I've never defined any hook, so I guess this has been created by some command I did.
So I just commented out these lines in settings.py, and now kedro viz works just fine (but still take ~35 seconds to render):
Copy code
# Instantiated project hooks.
from ibc_codes.hooks import SparkHooks  # comment that out

# Hooks are executed in a Last-In-First-Out (LIFO) order.
HOOKS = (SparkHooks(),) # comment that out
n
when you create a new project, the prompt will ask you what extra tools do you need, the default is nothing so you probably choose
all
or spark?
I test it on my end, it takes about 15-20 seconds to start render. It's not exactly fast but reasonable to me. Do you need to keep restart the process for some reason? It would be great if you can elaborate it? Cc @Nero Okwa
f
Yes I chose
all
! I'll try with the default, thanks!
👍🏼 1
n
Thanks @Francis Duval for the feedback. Surprised that Kedro-Viz takes up to 35 seconds to render. How many nodes do you have in your project? CC @Rashida Kanchwala
f
48 nodes, 74 datasets, but it was rendering slowly even when my project was small!
n
CC @Rashida Kanchwala & @Stephanie Kaiser
r
we are releasing a fix for this. in the meantime can u try doing
kedro viz run --ignore-plugins