ETL end-end Real time streaming of Weather forecast data using Dockerzised Airflow,OpenWeather API,Python,PostgresDB

--

In this blog we will understand am END to END project updating Weather forecast data on realtime basis for everyday.

Project Detailing:-

We are using Open weather application to pull the real time data on daily basis using it API with python as a language and store the data into postgresDB and also into local for quick data checks.

Orchestrating this end to end project using AIRFLOW as a Orchestration tool which we will install using DOCKER as a virtual machine.

Before deep dive into code first download code from GITHUB as provided.

Here we will learn some understanding on Docker installation along with Airflow image ,running python code on airflow and loading data into postgresDB.

Installation steps:-

  1. Docker application from official website.

>> https://docs.docker.com/get-docker/

2) Signup for to Open weather website and generate keys as shown in below

>> https://openweathermap.org/
>> https://home.openweathermap.org/api_keys

3) Install postgresDB into local system and create a DB of your own.. i have created as shown in below.

>> https://www.postgresql.org/download/

Also provided SQl file in the code to create DB,SCHEMA,TABLE.

Code Read:-

This code consists mainly config files,utilites file,json reader,python main file and aiflow dag py file.

Project Structure

Source :-

we will pull data from open weather api using weather api keys that we generated.

Api keys location in project

ETL Analyisis:-

We have two utility files

a) coordinate_mapping.py >> this help to map the cities coordinates and also convert cleanse the data according for our needs.

b) json_reader.py >> it will map all the config files ,API keys properties file and postgres db properties file.

Target:-

Loading data into postgres db and also writing data to local machine for reference.

Storing in postgresDB
local file location

when we run this code locally on terminal

running locally on pycharm ide

First run the project code in any IDE of your choice and check all the libraries dependency are adjusted…

I have given requirements.txt file so it will be easy to install all the libraries.

Python libraries dependency

(If required) do all necessary adjustments for your project code as postgresdb properties file name changes etc.

Airflow Dag:-

Orchestrate the code to trigger daily on it own for given time, we will create a DAG to understand for Airflow application to perform our code automatically.

We can see this file in the code.

postgresDB checks:-

Local run:-
First run the code with postgresDB properties file where “host”: “local”,
Docker run:-
“host”: “host.docker.internal”,

Once you install all the dependencies run the docker-airflow application with below commands

  1. open the docker application.
  2. In other tab open the CMD and go to file location of the code that we have downloaded from github .
  3. there will be one yaml file called doker-compose.yml file just run the command >> docker-compose up -d (running in background)

Check all the dependency worker,scheduler webserver,db nodes are healthy and up and running >>> docker ps

can also check from docker application all the nodes up and running…(all green)

Docker Image
Docker Container

Once all are ok… go to the web browser localhost:8080 and see you would be able to see Airflow UI login page always maintain username and password as “airflow” for good remembrance.

now from the project code we have a py called openApiWeatherdag that is visible on the aiflow dags section(screenshot)

Now trigger the dag and see the tree structure about running history of the dag

Click on the Graph view of the DAG structure to to know log strucutre

This Code might look Clumsy but it serves the purpose.

Note:- If anyone has a better approach to generalizing the project code i’m happy to embed .

That’s all for now…Happy Learning….

Please clap and Subscribe/follow my profile…Don’t forget to Comment…

--

--

Sairamdgr8 -- An Aspiring Full Stack Data Engineer

Data Engineer @ AWS | SPARK | PYSPARK | SPARK SQL | enthusiast about #DataScience #ML Enthusiastic#NLP#DeepLearning #OpenCV-Face Recognition #ML deployment