Pedro Sousa Silva
12/05/2023, 11:32 AMmarrrcin
12/05/2023, 12:03 PMPedro Sousa Silva
12/05/2023, 12:08 PMPedro Sousa Silva
12/05/2023, 2:08 PMmarrrcin
12/05/2023, 2:17 PMdef pandas_appender_pipeline():
return pipeline(
[
node(
func=lambda: pd.DataFrame(
{
"A": np.random.randint(0, 512, 5),
"B": np.random.randint(512, 1024, 5),
}
),
inputs=None,
outputs="csv_appender",
name="generate_dataframe",
),
node(
func=lambda df: print(df),
inputs="csv_appender",
outputs=None,
name="print_dataframe",
),
]
)
Catalog:
csv_appender:
type: pandas.GenericDataset
file_format: csv
filepath: data/03_primary/csv_appender.csv
fs_args:
open_args_save:
mode: a
save_args:
mode: a
index: false
header: false
Pedro Sousa Silva
12/05/2023, 2:34 PMmarrrcin
12/05/2023, 2:47 PMPedro Sousa Silva
12/05/2023, 3:20 PMtry:
# check if the file exists
catalog.load('dataset_2')
except:
# otherwise create an empty dataset and save it
from kedro_datasets.pandas import ParquetDataset
import pandas as pd
df2_loc = ParquetDataset(filepath="dataset_2.parquet")
df2_loc.save(pd.DataFrame(columns=['col1', 'col2']))
marrrcin
12/05/2023, 3:30 PMPedro Sousa Silva
12/05/2023, 4:11 PM