https://kedro.org/ logo
#random
Title
# random
d

Deepyaman Datta

03/02/2024, 6:28 PM
Thoughts on the recent PyAirbyte announcement? https://docs.airbyte.com/using-airbyte/pyairbyte/getting-started At least there's some Python-native "EL" solution (was there already something else?), even if it's just for prototyping. Trying to find something to complement to "T" with Kedro and Ibis in a Python-first analytics stack. (Also interesting that it defaults writing to DuckDB. 🤔 Lot of defaulting to DuckDB these days...)
@Juan Luis I'm sure you have thoughts? 😛 I haven't tried Airbyte, and I think I saw you haven't, either, but I've heard very mixed things about it... But still, Python option... 😋
n

Nok Lam Chan

03/02/2024, 6:39 PM
So it's a build in connector with many different source? seems pretty trivial to build a dataset for it. I've never use it tho
d

Deepyaman Datta

03/02/2024, 8:56 PM
Yeah, I had the same thought that it could be a dataset... Usually, EL is managed separately, but maybe it's an advantage that Kedro can include it in its abstraction?
👍 1
n

Nok Lam Chan

03/03/2024, 12:11 AM
https://github.com/noklam/airbyte-dataset-example/blob/main/PyAirbyte_Github_Incremental_Demo.ipynb Did something to test the idea quickly. The connectors basically sink everythings to a duckdb cache. Should this EL steps combined in 1 node? Or is it more natural to run the connector as a separate step, and just load up a Duckdb data as node inputs after?
d

Deepyaman Datta

03/03/2024, 12:12 AM
The latter, I think
j

Juan Luis

03/03/2024, 8:13 AM
first impression:
Copy code
source = ab.get_source(
    "source-faker",
    config={"count": 5_000},
    install_if_missing=True,
)
I don't like tools that install stuff on my behalf 🤷🏼 https://github.com/airbytehq/PyAirbyte/blob/9fdce038563e8c6ea422fda1c7f2b76cd4b6e4e1/airbyte/_executor.py#L196-L209
second impression: after reading the quickstart snippet and skimming over 2 demos, it's clear to me how to read data from a source, but I'm not seeing how to load data to a target? is it maybe that it's not supported?
so, thoughts: 1. I think it's a step in the right direction for them. making installation easier will only make adoption easier. in fact, I chose Meltano over Airbyte for an ELT stack 2 years ago because of this reason. 2. the docs and demos didn't generate much excitement in me. I know Airbyte calling
pip
calls for trouble (to be fair, it's opt-out, but just seeing it gave me goosebumps). having said that, I think after a quick exploration it's clear how to use it and I might give it a try. 3. Airbyte needs to overcome a bad reputation of having broken connectors. 4. and finally (but this is a very personal thing), as much as I dislike the state of open source EL(T) tooling, I am really not a fan of Airbyte marketing https://airbyte.com/blog/why-you-should-not-build-your-data-pipeline-on-top-of-singer
n

Nok Lam Chan

03/03/2024, 1:24 PM
I have a very rough dataset implementation above, I think the load method should just return the cache and maybe save is not implemented.
👀 1
2 Views