Hey team! While trying kedro 19.10 starter (spacef...
# questions
p
Hey team! While trying kedro 19.10 starter (spaceflights-pyspark) I am trying to access the
package_name
in the ProjectContext I create but it's being initialized as
None
. Looks like we don't set this variable in the
kedro run
cli flow but for configure project flow we do set the package name. Is this intended?
👀 1
h
Someone will reply to you shortly. In the meantime, this might help:
n
ProjectContext has been removed since 0.18, how do you set this? Or maybe you can just the KedroContext to access this via hook instead
p
Hey Nok! Thanks for replying. I am creating a custom ProjectContext and using this to set the app name for spark context initialization
We are not allowed to create a ProjectContext anymore starting Kedro 0.19?
n
How are you creating it?
p
Copy code
# settings.py
from package.context import ProjectContext
CONTEXT_CLASS = ProjectContext

# context.py
from kedro.framework.context import KedroContext

class ProjectContext(KedroContext):
    def __init__(
        self,
        package_name: str,
        project_path: Union[Path, str],
        config_loader: AbstractConfigLoader,
        hook_manager: PluginManager,
        env: str = None,
        extra_params: dict[str, Any] = None,
    ):
        super().__init__(
            package_name=package_name,
            project_path=project_path,
            config_loader=config_loader,
            hook_manager=hook_manager,
            env=env,
            extra_params=extra_params,
        )
        self.package_name = package_name

    def init_spark_session(self) -> None:
        ... (I try to get self._package_name and self.package_name here but I get None for both)
👀 1
n
I am surprised this happened, looks like a bug to me.
I can reproduce this on the latest
main
@Juan Luis maybe this is something we need to fix while cleaning up the Session/Context? It seems that it doesn't even work with
hook
, I am surprised since this mean the telemetry data will be skewed too
@Puneet Saini Do you actually need the
package_name
for specific logic? I remember this is removed from
KedroSession
because it's not necessary anymore, it looks like we didn't clean up properly
p
Yeah, I think it would work fine if ran with
__main__.py
But for cli it doesn't work
n
how do you run it?
p
I run using cli
n
So you are using the package name as a spark session name?
p
Yes sir
n
Can you open an issue? I am pretty sure this is a bug. I am not sure why
kedro run
doesn't work but running
__main__.py
works
One quick workaround is that you can use the PACKAGE_NAME global directly.
Copy code
from kedro.framework.project import PACKAGE_NAME
print(PACKAGE_NAME)
As long as this is imported after the session involved, it should read the package name
image.png
p
So I might be wrong but I skimmed through the code and it seems like this global var is not being set when triggered from cli. But from
__main__.py
I suspect it would (I haven't tried the main approach, just saying looking at the code)
n
One caveat if you want to keep this as a top-level import, don't import the
PACKAGE_NAME
directly because in Python this create new variable (so it's not pointing to global anymore). If you do
from kedro.framework import project
, and use
project.PACKAGE_NAME
, you would be accessing the global correctly.
👍 1
p
I am doing this right now:
self.package_name = package_name or Path(__file__).parent.name
n
this is fine - if your folder name is equal to the directory
j
the
PACKAGE_NAME
is set in
configure_project
https://github.com/kedro-org/kedro/blob/34e65ae8fa2843353b6db8c6de946b508a98634d/kedro/framework/project/__init__.py#L311-L312 (sorry if this has been mentioned before, skimmed the conversation quickly)
n
@Juan Luis Yes -
PACKAGE_NAME
is set correctly, but the
_package_name
in
KedroContext
is not
not sure which one we track in telemetry.
p
Does CLI call configure_project?
👍🏼 1
j
indeed, I don't see
KedroContext._package_name
being ever reassigned... dead code maybe? should we open an issue about it?
👍🏼 1