Acing Apache Spark Senario-based Question Series-5 using PySpark Dataframes

Joining two tables Vetrically when no common column is involved in Pyspark using monotonically_increasing_id() & zipwithIndex() functions.

In this senario we will discuss how to concatenate two tables vertically without having a common join column in different ways using pyspark.

As per the databricks documentation i have able to found only two ways using inbuilt functions

  1. Monotonically_increasing_id( ) for dataframe
  2. ZipwithIndex() for RDD.

Simply will try to decipt the Input and required Output dataframes:-

Input and Output Dataframes

Lets check the code …

This Code might look Clumsy but it serves the purpose.

Note:- If anyone has a better approach to generalizing this code happy to embed it in my script.

That’s all for now…Happy Learning….

Please do clap and Subscribe to my profile…Don’t forget to Comment…

--

--

Sairamdgr8 -- An Aspiring Full Stack Data Engineer

Data Engineer @ AWS | SPARK | PYSPARK | SPARK SQL | enthusiast about #DataScience #ML Enthusiastic#NLP#DeepLearning #OpenCV-Face Recognition #ML deployment