Acing Apache Spark DataFrames Interview Questions Series using PySpark with Lead and Lag

In this blog we will see a scenario based Dataframe question.

Lead and Lag are two functions from Sql which is used to get preceding and succeeding value of any row within its partition

Using these functions we will perform pysparkic way of Dataframe Transformations.

Scenario Question:

we need to get the percteage growth of sales from each month to previous month sales amount

>>> With help of lag function and using the math calaculation for percentage we can achieve this..

I may confuse you guys…to be simple and clear please checkout the requirement in the below pic

Lets check the code..

Note:- If anyone has a better approach to generalizing this code happy to embed it in my script.

That’s all for now…Happy Learning….

Please clap and Subscribe/follow my profile…Don’t forget to Comment…

--

--

Sairamdgr8 -- An Aspiring Full Stack Data Engineer

Data Engineer @ AWS | SPARK | PYSPARK | SPARK SQL | enthusiast about #DataScience #ML Enthusiastic#NLP#DeepLearning #OpenCV-Face Recognition #ML deployment