Mikołaj Tym
04/28/2025, 1:44 PMElena Khaustova
04/28/2025, 1:49 PMMikołaj Tym
04/28/2025, 2:24 PM# Here you can define all your datasets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: <https://docs.kedro.org/en/stable/data/data_catalog.html>
test_csv:
type: pandas.CSVDataset
filepath: data/df_test.tsv
load_args:
sep: "\t"
keep_default_na: False
encoding: "utf-8"
save_args:
index: false
sep: "\t"
header: False
Example pipeline.py:
"""
This is a boilerplate pipeline 'create_df'
generated using Kedro 0.19.10
"""
from kedro.pipeline import Pipeline, pipeline, node
from .nodes import create_df
def create_pipeline(**kwargs) -> Pipeline:
return pipeline([
node(
func=create_df,
inputs=None,
outputs='test_csv',
name='create_df',
)
])
nodes.py:
"""
This is a boilerplate pipeline 'create_df'
generated using Kedro 0.19.10
"""
import pandas as pd
def create_df():
data = {
"label": ["A", "B", "C"],
"country": ["GB", "ES", "FR"]
}
df = pd.DataFrame(data)
return df
So kedro run
creates the csv file. This file contains extra \n
between the lines. I tried to adjust the lineterminator parameter but it does not help. I'm using windows11.
data_path = 'C:/csv-extra-line/data/df_test.tsv'
with open(data_path, 'r', encoding='utf8') as file:
lines = file.readlines()
for i, line in enumerate(lines[:5]):
line_str = f"Line {i + 1}: '{line}'"
print(repr(line_str))
Code output:Nok Lam Chan
04/28/2025, 3:24 PMkedro-datasets
and fsspec
you have?Elena Khaustova
04/28/2025, 3:26 PMElena Khaustova
04/28/2025, 3:29 PMMikołaj Tym
04/28/2025, 6:17 PMMikołaj Tym
04/28/2025, 6:19 PM<http://df.to|df.to>_csv
output is correct (without extra lines).Elena Khaustova
04/29/2025, 10:45 AMMikołaj Tym
04/29/2025, 11:39 AMlineterminator: "\n"
parameter to save_args
solves the issue, so there is no extra line anymore.
Yesterday, I added this parameter using single quotes which caused YAML to misinterpret it and save the whole file as a single line on windows. Using double quotes fixes the problem.
However, it is not default parameter and these extra lines are unexpected output.
If you think it is worth investigating further, I can create an issue - otherwise, there are some workaround with this lineterminator
parameter.Elena Khaustova
04/29/2025, 12:37 PM