Acing Apache Spark DataFrames Interview Questions Series using PySpark with Lead and Lag
In this blog we will see a scenario based Dataframe question.
Lead and Lag are two functions from Sql which is used to get preceding and succeeding value of any row within its partition
Using these functions we will perform pysparkic way of Dataframe Transformations.
Scenario Question:
we need to get the percteage growth of sales from each month to previous month sales amount
>>> With help of lag function and using the math calaculation for percentage we can achieve this..
I may confuse you guys…to be simple and clear please checkout the requirement in the below pic
Lets check the code..
Note:- If anyone has a better approach to generalizing this code happy to embed it in my script.
That’s all for now…Happy Learning….
Please clap and Subscribe/follow my profile…Don’t forget to Comment…