https://kedro.org/ logo
#questions
Title
# questions
f

Francis Duval

01/24/2024, 3:28 PM
Hello! Time goes by and I have more and more warnings when running
kedro run
and `kedro viz run`:
Starting Kedro Viz ...
Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.
Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/24 09:38:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/01/24 09:38:29 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
The more warnings I have, the longer it takes to run the commands. It nows takes more than one minute for kedro viz to render.🫠I was thus wondering if any of you had already encountered one of these and if yes, how you proceeded to fix them. Edit: I updated Hadoop to 3.3.6 (from 3.3.0), and I have only one warning left:
Starting Kedro Viz ...
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/24 11:39:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
n

Nok Lam Chan

01/24/2024, 4:36 PM
Spark shouldn't even be initialised when you start viz
f

Francis Duval

01/24/2024, 4:52 PM
Do you think that an old version of Java (1.8.0_101) could slow down the rendering of the commands?
n

Nok Lam Chan

01/24/2024, 4:52 PM
I don't think Spark/Java should even be initialised
As you are just starting viz, there shouldn't be any data load from Spark?
f

Francis Duval

01/24/2024, 4:55 PM
I don't even use spark in my code (Spark is not installed in my conda environment).
So I would say no
n

Nok Lam Chan

01/24/2024, 4:57 PM
Do you have idea where is that log coming from? it's not kedro
f

Francis Duval

01/24/2024, 5:09 PM
It looks like it comes from Spark 🤔
n

Nok Lam Chan

01/24/2024, 5:11 PM
But you mentioned you are not using Spark at all? can you check your
catalog.yml
to see if there are any spark things or external package that may bring some attention?
f

Francis Duval

01/24/2024, 5:12 PM
Oh yes I have this in spark.yml:
Copy code
# You can define spark specific configuration here.

spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR
I think it's because I installed kedro-mlflow, and finally didn't use it.
n

Nok Lam Chan

01/24/2024, 5:16 PM
Can you try
kedro viz run --ignore-plugins
, I think this option only available recently, I am using the latest myself 7.1.0.
f

Francis Duval

01/24/2024, 5:19 PM
Yes it works! And I don't have the Spark warning anymore. So I guess I need to uninstall kedro-mlflow. Now it takes about 30 seconds for kedro viz to render.
n

Nok Lam Chan

01/24/2024, 5:26 PM
yep, if you don't need it just uninstall it. Optionally you can disable it in
settings.py
if you are sharing same conda env for different project
f

Francis Duval

01/24/2024, 5:28 PM
Perfect, thanks a lot!
I also found this in src/project_name/hooks.py:
Copy code
from kedro.framework.hooks import hook_impl
from pyspark import SparkConf
from pyspark.sql import SparkSession


class SparkHooks:
    @hook_impl
    def after_context_created(self, context) -> None:
        """Initialises a SparkSession using the config
        defined in project's conf folder.
        """

        # Load the spark configuration in spark.yaml using the config loader
        parameters = context.config_loader["spark"]
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
            SparkSession.builder.appName(context.project_path.name)
            .enableHiveSupport()
            .config(conf=spark_conf)
        )
        _spark_session = spark_session_conf.getOrCreate()
        _spark_session.sparkContext.setLogLevel("WARN")
I've never defined any hook, so I guess this has been created by some command I did.
So I just commented out these lines in settings.py, and now kedro viz works just fine (but still take ~35 seconds to render):
Copy code
# Instantiated project hooks.
from ibc_codes.hooks import SparkHooks  # comment that out

# Hooks are executed in a Last-In-First-Out (LIFO) order.
HOOKS = (SparkHooks(),) # comment that out
n

Nok Lam Chan

01/25/2024, 11:21 AM
when you create a new project, the prompt will ask you what extra tools do you need, the default is nothing so you probably choose
all
or spark?
I test it on my end, it takes about 15-20 seconds to start render. It's not exactly fast but reasonable to me. Do you need to keep restart the process for some reason? It would be great if you can elaborate it? Cc @Nero Okwa
f

Francis Duval

01/26/2024, 5:48 PM
Yes I chose
all
! I'll try with the default, thanks!
👍🏼 1
n

Nero Okwa

02/01/2024, 11:41 AM
Thanks @Francis Duval for the feedback. Surprised that Kedro-Viz takes up to 35 seconds to render. How many nodes do you have in your project? CC @Rashida Kanchwala
f

Francis Duval

02/01/2024, 3:07 PM
48 nodes, 74 datasets, but it was rendering slowly even when my project was small!
n

Nero Okwa

03/26/2024, 11:09 AM
CC @Rashida Kanchwala & @Stephanie Kaiser
r

Rashida Kanchwala

03/26/2024, 11:24 AM
we are releasing a fix for this. in the meantime can u try doing
kedro viz run --ignore-plugins
2 Views