Hi kedronistas :) I have a question about customiz...
# questions
f
Hi kedronistas :) I have a question about customization: In my company we have our own cookiecutter template resp. folder structure and naming conventions. I am struggling integrating kedro capabilities into our existing template. E.g. we don't follow the "src" naming convention but "<pkg_name>". How can I configure kedro so it knows about that and looks e.g. for the kedro_cli.py in there? So to wrap up: is it possible and if yes, what is best practice, to configure kedro to integrate well into an existing repo structure without loosing kedro functionality?
šŸ‘ 1
šŸ‘€ 1
d
Hey @fmfreeze; I was actually just looking at this due to your comment on another thread. In short, I don't think it's straightforward to maintain Kedro CLI functionality while changing your project layout.
The project template is pretty opinionated about how it structures the project, and the CLI leverages that. You can configure the source directory location~, but it is still going to require a
src
folder under there I think~.
You can, of course, use the underlying runner, pipelines, nodes, etc. in any project structure.
j
I think it's worth trying it out. if I understand correctly,
[tool.kedro]
in
pyproject.toml
would remain untouched, and as long as there's a
settings.py
, everything should be fine? but I might be missing things
you're right @Deepyaman Datta, there seems to be something hardcoded to
src
in the
ProjectMetadata
. but maybe it shouldn't have to be that way šŸ¤”
šŸ¤” 1
you can configure the
source_dir
in
pyproject.toml
, it turns out!
šŸ‘ 1
šŸ„³ 1
šŸ‘šŸ¼ 1
f
Thank you for those super-quick responses :) Having no cli functionality out of the box and run pipelines in python is ok to me. But when it comes to kedro-viz, things might get even trickier?! So when no way exists (yet?) to tell kedro where which files are stored where, I think I have to start finding arguments to argue with my boss why we should rearrange future project structure around kedro (and not vice versa, as intended).
I'll try that asap, thank you šŸ‘
d
You can configure the source directory location, but it is still going to require a
src
folder under there I think.
I misspoke on this/misread the code; it will default to
src
is
source_dir
isn't provided, but you should be able to override it.
But when it comes to kedro-viz, things might get even trickier?!
I don't think Kedro-Viz will be any worse; it basically also delegates the context-loading to
kedro.framework.startup.bootstrap_project
, so as long as that works, you could be fine.
j
in other words:
Copy code
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,7 @@
 [tool.kedro]
 package_name = "spaceflights"
 project_name = "Spaceflights"
+source_dir = "spaceflights"
 kedro_init_version = "0.18.8"
 
 [tool.isort]
this moves you from a
src
layout to a "flat" layout (instead of
src/spaceflights/settings.py
, you'd have
spaceflights/settings.py
)
šŸ‘ 2
šŸ‘šŸ¼ 1
f
@Juan Luis I tried your suggestion. Adding the "source_dir" works fine for renaming the "src" folder, but I cannot reproduce your "flat hierarchy" statement. In the spaceflights example, what do you change in the file-/folder structure to get a flat(ter) hierarchy?
j
@fmfreeze I did
mv src/spaceflights/ .
, hence having a
spaceflights
directory at the root of the project. is that what you were attempting?
f
It is not exactly what I was attempting, but also that I cannot reproduce. (
ModuleNotFoundError: No module named spaceflights
).
j
interesting, that happened to me yesterday the first time I tried it, and now I'm reproducing it again. while I figure this out, @fmfreeze what would be your ideal
tree
output? here's mine at the moment:
Copy code
.
ā”œā”€ā”€ README.md
ā”œā”€ā”€ conf
ā”‚   ā”œā”€ā”€ ...
ā”œā”€ā”€ data
ā”‚   ā”œā”€ā”€ ...
ā”œā”€ā”€ pyproject.toml
ā”œā”€ā”€ setup.cfg
ā””ā”€ā”€ spaceflights
    ā”œā”€ā”€ __init__.py
    ā”œā”€ā”€ __main__.py
    ā”œā”€ā”€ __pycache__
    ā”œā”€ā”€ pipeline_registry.py
    ā”œā”€ā”€ pipelines
    ā””ā”€ā”€ settings.py
f
Thx @Juan Luis for helping out. Your tree suggestion would be nice already. An ideal solution would be even more flexible to achieve a more granular structure. E.g. having settings.py & pipeline_registry.py in it's own kedro subfolder. But that would just be a nice-to-have to avoid confusion among our scientists (which are no SW Devs - and don't have to be :)
j
okay, I was sending you the wrong path - this is the actual
source_dir
for my tree above:
Copy code
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,7 @@
 [tool.kedro]
 package_name = "spaceflights"
 project_name = "Spaceflights"
+source_dir = "."
 kedro_init_version = "0.18.8"
(by using
source_dir = "spaceflights"
Kedro was looking in the wrong place - I'm not sure how I got it working yesterday)
An ideal solution would be even more flexible to achieve a more granular structure.
E.g. having settings.py & pipeline_registry.py in it's own kedro subfolder.
I agree it would be nice indeed. at the moment it's hardcoded: https://github.com/kedro-org/kedro/blob/2e70dec396567d2ba38456179aad0d4ae8e83b31/kedro/framework/project/__init__.py#L256-L259 would you want to open an issue about it @fmfreeze? (no guarantees about it, but at least might serve to spark broader discussion)
f
j
thanks for your patience @fmfreeze!
šŸ‘ 1
f
What is the part you mean by hardcoded? What would you change?
j
what I meant is that the code I highlighted is part of Kedro and is what prevents users from customizing where the
settings.py
and
pipeline_registry.py
- so, to allow for the customization you proposed @fmfreeze, Kedro would need to make those lines more generic and allow for some configuration, similar to what
source_dir
does. does this make sense?
f
Yes, it makes sense and would be fine. Your sugesstion with
source_dir
works also nice. šŸ‘
j
great to know! šŸ„³