Hello :slightly_smiling_face: anyone that has alre...
# questions
t
Hello πŸ™‚ anyone that has already implemented a Kedro type to read Text Files as RDD in Spark? (extra points if you have even done it for XML or KML files πŸ˜‰), if not, I would like to know what would be the best (simplest) way to implement this class from the existing methods/templates in Kedro. Thanks a lot in advance!
n
Can you try passing the the
format
into the
load_args
?
def load(self, path=None, format=None, schema=None, **options):
β€œβ€"Loads data from a data source and returns it as a class`DataFrame`.
.. versionadded:: 1.4.0
Parameters
----------
path : str or list, optional
optional string or a list of string for file-system backed data sources.
format : str, optional
optional string for format of the data source. Default to β€˜parquet’.
This is an excerpt from Spark documentation, we use DataFrameReader under the hood, so whatever Spark support should work out of the box.
πŸ™Œ 1
t
Thanks a lot @Nok Lam Chan! Problem is that RDD is not part of SparkDataFrames, it uses SparkContext and not spark.sql for example. It is more primitive πŸ˜›, but I cannot believe that I am the first one trying to read RDDs in Kedro!
n
Hopefully someone can chim in. I can have a look when I come back next week. I cannot think of anything top of my head nowπŸ˜› I am sure someone have tried. Even if there is no existing implementation, I would suggest look at the SparkDataset implementation, it shouldn't be too difficult to write your custom dataset, many of the code is just to handle the path and make sure it works in different storage system and databricks. If you are interested in making a PR to add this to the datasets I am happy to help.
How would you load it with just pure spark? Can you show me a snippet if possible?
t
Hi Nok! It would be quite simple:
>> textFiles = sc.wholeTextFiles(dirPath)
My point is that it kedro lacks RDD capabilities πŸ˜…. Generally speaking this should be a must when working with Spark (not all big data is structured data). KMl files o XLM files are an example of this. Thanks a lot for your help! πŸ€—
thankyou 1