Acing Apache Spark RDD Interview Questions Series-2 using PySpark

In this Blog, we will discuss converting single-line text to a Dataframe using Apache Spark RDD

Single line RDD — DataFrame

As we covered RDD basics in the series-1 so here we will be discussing on the problem statement Directly.

If you have not yet read the document kindly go through this link….

Problem Statement:-

We have text file with single row where name and organization name are separated by ‘#’ like all other values are in the same single line.

so we need to split each set of record separately and create two columns for name and organization name for the set of each record and convert it into a Dataframe

As our input file is a text file so we will be treated that as an RDD.

So let's jump into the code…

By this, we can achieve converting the RDD to Desired DataFrame…..

That's it for now…

Please Clap, Share and Comment if you see any other options…

Thanks for Reading……

--

--

Sairamdgr8 -- An Aspiring Full Stack Data Engineer

Data Engineer @ AWS | SPARK | PYSPARK | SPARK SQL | enthusiast about #DataScience #ML Enthusiastic#NLP#DeepLearning #OpenCV-Face Recognition #ML deployment