https://kedro.org/ logo
#questions
Title
# questions
e

Emilio Gagliardi

11/27/2023, 8:00 PM
hi kedro peeps 🙂 what is the kedro approach to handle pipelines supported by imported packages? For example, I'm building a project which uses Llama Index for RAG functionality. In their newest version they've released an ingestion pipeline construct where you specify a sequence of class calls to ingest/preprocess your data. the benefit is that each step is cached (in-memory or external database). it looks like `
Copy code
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=25, chunk_overlap=0),
        TitleExtractor(),
        OpenAIEmbedding(),
    ]
)
` now in this context, if I put that pipeline definition inside a single kedro node, the node doesn't perform a single task but I don't understand how to do it across multiple kedro nodes. It also makes me think of scikit pipelines...any wisdom or advice is greatly appreciated!
d

Deepyaman Datta

11/27/2023, 9:35 PM
This is something that comes up now and again, but no great solution exists. See https://kedro-org.slack.com/archives/C03RKP2LW64/p1666869042994569?thread_ts=1666864575.162609&cid=C03RKP2LW64 for some context.
e

Emilio Gagliardi

11/27/2023, 10:02 PM
ok, thank you kindly for illuminating the issue here. It sounds like the most practical approach is to just use the package pipeline within a single kedro node. cheers!
2 Views