Eduardo Romero López
07/12/2023, 12:37 PMEduardo Romero López
07/12/2023, 12:38 PMEduardo Romero López
07/12/2023, 12:39 PMJ. Camilo V. Tieck
07/12/2023, 9:32 PMLeslie Wu
07/13/2023, 12:51 PMkedro.io.core.DataSetError: Failed while loading data from data set ExcelDataSet(filepath=my/s3/path/file.xlsx, load_args={'engine': openpyxl, 'sheet_name': Sheet1}, protocol=s3, save_args={'index': False}, writer_args={'engine': xlsxwriter}).
my/s3/path/file.xlsx
Have no issues with other formats - parquet / csv / PDF. Anyone seen this before or have insights to where I am going wrong?
FYI, I am using kedro=0.17.7
Michel van den Berg
07/13/2023, 1:00 PMMerel
07/13/2023, 4:09 PM.xlsx
file into a Kedro SparkDataSet
?Rachid Cherqaoui
07/14/2023, 5:30 PMNelson Zambrano
07/15/2023, 11:02 PMDawid Bugajny
07/17/2023, 9:14 AMwith KedroSession.create(...) as session:
context = session.load_context()
cat = context.catalog
return SequentialRunner().run(catalog=cat, pipeline=pipeline)[...]
I have just discovered, that my API is single-threaded and new requests have to wait untill previeous requests finish. Does anybody solution for this problem and knows how to make API multithreaded?Eduardo Romero López
07/17/2023, 9:49 AMJo Stichbury
07/17/2023, 3:41 PMHigor Carmanini
07/17/2023, 7:26 PMpylance
incorrectly inferring that the pipeline
function (as imported from kedro.pipeline
is actually a module. It gets in the way of showing the proper documentation for kedro.pipeline.modular_pipeline.pipeline()
, and I figure could turn some less Kedro-savvy devs away by thinking they're doing it wrong (me a while back 🙃)Rachid Cherqaoui
07/17/2023, 9:04 PMPartitionedDataSet
function from <http://kedro.io|kedro.io>
to load a data but I've just seen that this function doesn't take the delimiter into account, how can I solve this? (knowing that I'm working on csv files on my local, here is the code used : data_set = PartitionedDataSet(
path = "data/01_raw/Tableaux",
dataset= CSVDataSet,
filename_suffix= ".csv",
load_args= {"delimiter": ";", "header": 0,"encoding": "utf-8"}
Marc Gris
07/18/2023, 4:47 AMJackson
07/18/2023, 6:54 AMclass VectorStore:
def __init__(
self,
client_path,
embedding_func) -> None:
self.collections = None
self.client = chromadb.PersistentClient(path=client_path)
self.embedding_func = embedding_func
def create_collections(self,collection_name):
self.collections = self.client.create_collection(collection_name,self.embedding_func)
return self.collections
def add_docs(
collections,
embeddings,
metadatas,
ids):
collections.add(
embeddings = embeddings,
metadatas = metadatas,
ids = ids
)
However, putting this inside nodes.py doesn't seems ideal due to I still have other classes (like model class) and I believe mixing everything inside a nodes is an anti-pattern. But if I write a standalone function in nodes.py like below seems redundant.
def create_collections(collections,collections_name):
collections.create_collections(collections_name)
So my question is what are the best way to separate classes and nodes, while avoiding code redundant at the same time?Daniel Lee
07/18/2023, 8:42 AMDataCatalog
, I would like to pandas.ParquetDataset
to partition by the date in the dataset and save into different folders by date in parquet like how we can do it for spark.SparkDataSet
. Is there a way we could partition using pandas?Zemeio
07/18/2023, 9:26 AM{%- for item in mylist %}
out.blind_predictions_{{ item-}}:
type: pandas.CSVDataSet
filepath: ${filepath1}_{{ item-}}.csv
layer: out
{% endfor %}
Globals:
mystli:
- item1
- item2
(For obvious reasons I removed the actual names from the text here)
Does anyone know how to accomplish this? (do a for here)Marc Gris
07/18/2023, 1:12 PMcatalog.yml
values that are defined in parameters.yml
ex:
in conf/base/parameters.yml
tenant_id: xyz
and in conf/base/catalog.yml
_tenant_id: ${tenant_id}
Thx in advanceRachid Cherqaoui
07/18/2023, 3:00 PMRachid Cherqaoui
07/19/2023, 9:29 AMkedro run --async
, it takes less time (significant) compared to when I use the KedroSession.create().run()
with FastAPI (knowing that in my post function I made the async def
) my question is how can I use the async
argument with kedrSession
that it is at the level of hooks
or otherwise, thank you in advance.Marc Gris
07/19/2023, 10:01 AMMarc Gris
07/19/2023, 11:19 AMRachid Cherqaoui
07/19/2023, 1:23 PMkedro run --async
, it takes less time (significant) compared to when I use the KedroSession.create().run()
with FastAPI (knowing that in my post function I made the async def
) my question is how can I use the async
argument with kedrSession
that it is at the level of hooks
or otherwise, thank you in advance.Marc Gris
07/19/2023, 2:11 PMModularPipelineError: Inputs should be free inputs to the pipeline
Could some kindly unpack / explain it ?
ThxCyril Verluise
07/19/2023, 5:55 PMVincent Liagre
07/20/2023, 11:44 AMsrc/my_module
) with pip install -e src
; now kedro
is looking for data within the my_module
folder from root for some reason. Any clue whats going on here and how I can solve this ?
Happy to provide more details if required 🙂Marc Gris
07/20/2023, 2:47 PMlocal/catalog.yml
does not override the `base/catalog.yml`…
Any idea what could cause this behavior ?
Thx
M.Christos Malliopoulos
07/21/2023, 11:18 AMNok Lam Chan
07/21/2023, 3:32 PMdf.describe()
• Need to work in Windows and Linux so wc
is not an option
• Need to be fast
• Bonus: is it possible to generalised to Excel filetype?