Hey team! Does kedro's JSONDataset support JSON li...
# questions
p
Hey team! Does kedro's JSONDataset support JSON lines?
h
Someone will reply to you shortly. In the meantime, this might help:
m
Hi @Puneet Saini, I am not sure because I haven't tried it, but it might be possible.
p
Did a quick check. Seems it is possible.
👍 1
l
Is it? I was just checking and it looks like
json.load(fs_file)
is used, which does not support
jsonl
file out of the box
p
Ohhh
I checked just save
image.png
Looks like load works too
l
You will probably have to expand the behaviour in a custom dataset, and hook that one up
p
Wait - but it works. See the screenshot
Aah wait - I made a mistake. It's not supposed to be a list of json objects
👀 1
Yes, it does error out
l
Maybe you can use the partitioned dataset?
partitionedDataset of JSON objects? Assuming there is 1 object per file
Cause I do assume you want to process all jsonl objects in the dir, or am I wrong?
d
so the pandas generic dataset probably works for this
but also a textdataset than you parse on both sides could as welll
i
No guarantee on bugs, but i've used this a few times Splitting on newlines like i did is probably not the right way but it worked for my purposes (not sure how it will act if there's a newline in the JSON - although it will probably be escaped)
Copy code
"""
Dataset definition for a JSONLines dataset.
"""
import json
from typing import List

from kedro.extras.datasets.text import TextDataSet

class JSONLinesDataSet(TextDataSet):
    """Class for handling JSON lines files (.jsonl).
    """

    def _load(self) -> List[dict]:
        return [json.loads(line) for line in super()._load().split("\n")]

    def _save(self, data: List[dict]) -> None:
        super()._save("\n".join(json.dumps(line) for line in data))