Hi I m encountering an issue where Kedro adds extra empty li Kedro #questions

Hi, I'm encountering an issue where Kedro adds ext...

Mikołaj Tym

04/28/2025, 1:44 PM

Hi, I'm encountering an issue where Kedro adds extra empty lines between each data row when saving a CSV file using the pandas.CSVDataset type. This results in empty rows when opening the file as text. Do you encounter this issue and know how can I prevent these extra lines from being added during the save process? It looks similar to this issue - https://github.com/kedro-org/kedro/issues/492

👀 1

Elena Khaustova

04/28/2025, 1:49 PM

Hey, can you please share how you’ve set your dataset or/and a minimum example to reproduce?

Mikołaj Tym

04/28/2025, 2:24 PM

Hi, thanks for your reply. catalog.yml:

Copy code

# Here you can define all your datasets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: <https://docs.kedro.org/en/stable/data/data_catalog.html>

test_csv:
  type: pandas.CSVDataset
  filepath: data/df_test.tsv
  load_args:
    sep: "\t"
    keep_default_na: False
    encoding: "utf-8"
  save_args:
    index: false
    sep: "\t"
    header: False

Example pipeline.py:

Copy code

"""
This is a boilerplate pipeline 'create_df'
generated using Kedro 0.19.10
"""

from kedro.pipeline import Pipeline, pipeline, node

from .nodes import create_df


def create_pipeline(**kwargs) -> Pipeline:
    return pipeline([
        node(
            func=create_df,
            inputs=None,
            outputs='test_csv',
            name='create_df',
        )
    ])

nodes.py:

Copy code

"""
This is a boilerplate pipeline 'create_df'
generated using Kedro 0.19.10
"""
import pandas as pd

def create_df():
    data = {
        "label": ["A", "B", "C"],
        "country": ["GB", "ES", "FR"]
    }
    df = pd.DataFrame(data)
    return df

kedro run

creates the csv file. This file contains extra

\n

between the lines. I tried to adjust the lineterminator parameter but it does not help. I'm using windows11.

Copy code

data_path = 'C:/csv-extra-line/data/df_test.tsv'
with open(data_path, 'r', encoding='utf8') as file:
    lines = file.readlines()

for i, line in enumerate(lines[:5]):
    line_str = f"Line {i + 1}: '{line}'"
    print(repr(line_str))

Code output:

Nok Lam Chan

04/28/2025, 3:24 PM

Which version of

kedro-datasets

and

fsspec

you have?

Elena Khaustova

04/28/2025, 3:26 PM

It also looks ok on MacOs

Elena Khaustova

04/28/2025, 3:29 PM

@Mikołaj Tym can you please double-check that if you write the same df with just pandas without kedro doesn’t produce these empty lines?

Mikołaj Tym

04/28/2025, 6:17 PM

@Nok Lam Chan and kedro version 0.19.12

Mikołaj Tym

04/28/2025, 6:19 PM

@Elena Khaustova Using

<http://df.to|df.to>_csv

output is correct (without extra lines).

Elena Khaustova

04/29/2025, 10:45 AM

I can’t check it on Windows and it works fine on MacOs, but it looks like a bug - something with how the tab separator is encoded. Feel free to create an issue so we can investigate and address it. Sorry about such an experience 😔

Mikołaj Tym

04/29/2025, 11:39 AM

I really appreciate your help! I discovered that adding

lineterminator: "\n"

parameter to

save_args

solves the issue, so there is no extra line anymore. Yesterday, I added this parameter using single quotes which caused YAML to misinterpret it and save the whole file as a single line on windows. Using double quotes fixes the problem. However, it is not default parameter and these extra lines are unexpected output. If you think it is worth investigating further, I can create an issue - otherwise, there are some workaround with this

lineterminator

parameter.

Elena Khaustova

04/29/2025, 12:37 PM

Thanks for sharing your workaround, in this case, we don’t need a separate issue for that. Hope that your further experience will be smoother!

👍 1

7 Views

Open in Slack

Previous Next