Acing Apache Spark Senario-based Question Series-5 using PySpark Dataframes
Joining two tables Vetrically when no common column is involved in Pyspark using monotonically_increasing_id() & zipwithIndex() functions.
In this senario we will discuss how to concatenate two tables vertically without having a common join column in different ways using pyspark.
As per the databricks documentation i have able to found only two ways using inbuilt functions
- Monotonically_increasing_id( ) for dataframe
- ZipwithIndex() for RDD.
Simply will try to decipt the Input and required Output dataframes:-
Lets check the code …
This Code might look Clumsy but it serves the purpose.
Note:- If anyone has a better approach to generalizing this code happy to embed it in my script.
That’s all for now…Happy Learning….
Please do clap and Subscribe to my profile…Don’t forget to Comment…