Hi everyone, I have a versioned `.txt` file genera...
# questions
r
Hi everyone, I have a versioned
.txt
file generated by a Kedro pipeline that I created, and I'd like to send it to a folder on a remote server via SFTP. After several attempts, I found it quite tricky to handle this cleanly within Kedro, especially while keeping things consistent with its data catalog and hooks system. Would anyone be able to help or share best practices on how to achieve this with Kedro? Thanks in advance for your support!
👀 1
any help plz ?
j
Hey Rachid, I am looking into it. Thanks.
r
@Jitendra Gundaniya Thanks, I'm waiting for ur suggestions ^^
j
I think hooks would be the most appropriate solution for this, specifically using
after_dataset_saved
hook to upload files through SFTP. Please checkout hook docs.
r
Thanks for the suggestion — I did look into using the
after_dataset_saved
hook, but I haven't managed to get it working properly, especially since my dataset is a plain
.txt
file (a
TextDataset
). I'm a bit unsure how to extract the right file path (with versioning), and how to connect that to the SFTP upload. Would it be possible for you to share a concrete example of how to use
after_dataset_saved
to upload a versioned
.txt
file to an SFTP server? That would help a lot. Thanks in advance!
s
Hi @Rachid Cherqaoui, You could try and use resolve_load_version, and just check for attribute for the versioned datasets and build filepaths here is an example that might work:
Copy code
def after_dataset_saved(self, dataset_name: str, catalog: DataCatalog) -> None:

    datasets_to_upload = ["your_text_output"]  # Replace with your own

    if dataset_name not in datasets_to_upload:  # Some logic to decide which to upload
        return

    # Get the dataset from the catalog
    dataset = catalog._datasets[dataset_name]

    # For versioned datasets
    if hasattr(dataset, '_version') and dataset._version:
        load_version = dataset.resolve_load_version()

        # Construct the full file path with version
        if hasattr(dataset, '_filepath'):
            base_path = Path(dataset._filepath)
            versioned_path = base_path.parent / load_version / base_path.name
            local_file_path = versioned_path

    else:
        # For non-versioned datasets
        local_file_path = Path(dataset._filepath)

    # Upload to SFTP Example
    self._upload_to_sftp(local_file_path, dataset_name)