Yetunde08/17/2022, 11:03 AM
Yetunde08/24/2022, 11:30 AM
Yetunde08/26/2022, 2:35 PM
Asch Harwood10/28/2022, 1:37 PM
Filip Panovski11/10/2022, 1:36 PM
. I had a use case where I needed to parse an existing Avro schema and transform it into a pyarrow schema so that the
function behaves nicely. I was wondering whether this is something the community would be interested in and would appreciate feedback.
Sean Westgate11/28/2022, 3:42 PM
) will be discontinued with version 0.19, I wonder if a plugin should fill its place. @Asch Harwood suggested that there is a need for communicating with non-technical stakeholders - are there more users thinking this way? Would a plugin that assists with the creation of documentation be useful? In order to aid discussion, I played around with a prototype static site as an example for project documentation. I used the Kedro spaceflight tutorial project as a base, and you can explore the finished documentation here. Given that the Kedro framework defines much of the information needed for project documentation, I think it would be pretty straight forward to create a plugin that would: - create the basic documentation structure - fill in details about pipeline, nodes, data and parameters automatically - insert an interactive Kedro-Viz graph - provide empty templates for additional notes to write - example how to publish as a static website, for example to GitHub Pages Just to clarify, this is not a plugin, just a "fake" output for discussion. I would like to find out if: - such a plugin would be useful - maybe find a few projects other than spaceflight that could be used during development/specification - get clarity on desired functionality - maybe some collaborators interested in making it If you want to find out more you can also clone the project repo. There are instructions in the manual how to build the docs locally and use them. Leave your feedback either on slack or if concrete ideas, please create issues in the repo. Looking forward to hearing from you Sean
Yetunde12/06/2022, 3:02 PM
Matthias Roels01/14/2023, 9:04 PM
Andrew Stewart01/21/2023, 7:30 AM
Leo Casarsa02/09/2023, 5:09 PM
Workflow Orchestration tools..
I have been crushing through the documentation of a bunch of different workflow orchestration tools. This is my inner map so far. [...]
Kubeflow, Metaflow, Flyte, Kedro, and ZenML focus more on ML pipelines and experimentation usability, like easy switching between local and cloud. Kubeflow is for ML what Argo is for data flows, so expect it to be a steep learning curve if you are not a Kubernetes expert, which most data scientists are not, so this might explain why it is frowned upon. All of these are new and shiny, but again I need to dig a little deeper to understand the differences. Kedro is opinionated about project structure and does not seems to be build with big scalable workflows in mind, and I got the feeling that Kedro is like DVC but more aimed towards ML specifically, and thus it might be a good fit for consultants that are building many smaller projects (?), Metaflow, Flyte, and ZenML all deal with how to utilize compute clusters in an easy way. ZenML seems to me like it might have some gaps, but it is also the newest one, so that is to be expected at this point in time.Another member then replies:
Thanks for starting the thread, it's very interesting!
I'd like to clarify that Kedro is a Python library for building modular data science pipelines. Kedro helps you write data science workflows that are made of reusable components, each with a "single responsibility".
Kedro is not an orchestration tool like Argo Workflows or Kubeflow Pipelines. Check out the deployment guide for how to run Kedro pipelines on Airflow, Argo Workflows or Kubeflow Pipelines. We have successfully used Kedro to build data-science-friendly pipelines that we can still run at scale with Kubeflow Pipelines.https://mlops-community.slack.com/archives/C015J2Y9RLM/p1675865574676169
Amanda03/02/2023, 1:16 PM
Polly03/02/2023, 1:22 PM
Victoria Sicking03/14/2023, 5:23 PM
Polly03/18/2023, 3:05 PM
Deepyaman Datta03/18/2023, 4:08 PM
Oleg Pilipenok03/20/2023, 6:55 AM
Polly03/22/2023, 12:52 PM
Polly04/04/2023, 4:30 PM
Stephanie Kaiser04/05/2023, 2:23 PM
Polly04/26/2023, 2:20 PM
Merel06/01/2023, 8:54 AM
Juan Luis06/21/2023, 8:54 AM
Nok Lam Chan08/09/2023, 8:26 PM
Yetunde08/22/2023, 5:16 PM
straight from a Jupyter/Databricks/AWS SageMaker notebook without a project template or an IDE. • party wizard Use the project creation wizard to add features to your project template. Don't need the files and folders created by linting, testing, and documentation? No worries! Just skip those to get a simpler template. We'd love your help testing these ideas! If you can spare 30 minutes to try either of them, then indicate your interest with jupyter or party wizard. Your feedback will help make Kedro more flexible.
datajoely09/28/2023, 1:56 PM
Juan Luis10/13/2023, 12:17 PM
Deepyaman Datta10/22/2023, 1:32 PM
users out there! We have a question for you, related to enabling versioning for
--which of the below options makes the most sense to you? 1. https://github.com/kedro-org/kedro/pull/521 proposes to enable versioning of the underlying dataset, by specifying
in the dataset config:
On the plus side, having the
station_data: type: PartitionedDataset path: data/03_primary/station_data dataset: type: pandas.CSVDataset versioned: true
config on the
config makes it clear that the versioning is applied to the underlying dataset, not to the
. However, there are some edge cases (see https://github.com/kedro-org/kedro/pull/521#issuecomment-744653023, if you're keen). 2. Alternatively, we can move the
flag to the top level
Note that the versioning is still of the underlying dataset (e.g.
station_data: type: PartitionedDataset path: data/03_primary/station_data versioned: true dataset: type: pandas.CSVDataset
), even though the config is at the top level. 3. None of these options make sense; what you really need is versioning of the top-level dataset. (Note that we don't have a solution designed for this case, but it would be great to know nonetheless!) Please feel free to vote using 1️⃣2️⃣3️⃣, and elaborate further on your thoughts in the thread below!
Juan Luis11/02/2023, 1:37 PM
), or next to the code (
)? • when you create a new Kedro project, what are the steps you usually follow? for example 1. create and activate conda environment, 2.
pip install kedro
• what do you think of the current process? (please leave a reply on the thread 🧵, 1 comment per person to keep the conversation tidy) your feedback and ideas are very much welcome 🙏🏼
Роман Белый11/02/2023, 1:53 PM
Juan Luis11/06/2023, 9:23 AM
and them read them in
https://github.com/kedro-org/kedro/blob/93dc1a91e4bb476287040ea3db4a610696cacb0c/k[…]project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/pyproject.toml but you can also just avoid
files entirely. what do you think of this approach?