Instagram Data Analytics project using Pyspark,Snowflake,PowerBI

--

In this blog we going to discuss about architecture and code working of this Data analytics project.

I have been spending some good amount of time in Instagram for a period of time…So i always feel like why can’t we create a some insights to check my Activities on Instagram.

So i decided to study my data for Analytics purpose.

Before getting into this project i did some research whether to pull data from Instagram API (insta bot )or to download data manually from Instagram.
But API Connectivity data insights are limited as i felt, so took the second route.

This may be manual…but if some better way of connecting and get full data from Instagram please feel free to comment on this blog..

ok….Lets discuss about the project scope…

Before getting into this project please have Spark setup with Python and enviroment vairables ready.

Firstly download the project from GitHub.. and save it in local..

The project code structure code looks like this…

This Project mainly consists of four components majorly ..

  1. Instagram Data as a Source
  2. Pyspark for Data ETL Analysis
  3. loading Data into Snowflake (cloud-based data storage and analytics service)
  4. PowerBI for Data visualization.

Instagram Data:-

>> Go to the Instagram website and log in
>> Click your profile icon in the upper-right hand corner of the screen
>> Click Settings
>> Click Privacy and Security in the left sidebar
>> Scroll down to Data Download
>> Click Request Download
>> Select the profiles you’d like to download information from
>> Select how much information you want to download and tap Next
>> Choose if you want to download your information to a device or directly transfer your information to a destination and tap Next
>> Enter your email address
>> Click Next
>> Enter your account password
>> Select Requests Download

some pics for the reference..

Project code manual steps:-

once you download the project from GitHub do some necessary steps to run the code smoothly.

Once the instagram source files downloaded to local unzip it and add it to the location >>>> insta_Project/src/data/source_data

Config parameters :-
In order to read source files add your respective Instagram downloaded files location path for example
insta_Project/src/data/source_data/instagram-sairamdgr8–2024–02–28-JIlQW2Wf to parent path

Snowflake setup:-

Inorder to write data to snowflake have snowflake database create a account which is free version for 30 days..
a)create a db within public schema and take the connection and credential details which we will use them to to write data to snowflake
b)create tables in snowflake … i have provided sql_create table DDL queries in location >>>> insta_Project/src/sql
c)provide all necessary snowflake connection details in at location >>>> insta_Project/src/config/snowflake_config.json
d)download snowflake jars from maven spark-snowflake jar,snowflake jdbc jar and place them in SPARK_HOME/jars location
note:- these jars should be compatible with the spark version.. otherwise it will create issue while connecting data to snowflake.
i’m using spark 3.3.1 so i use respective dependency version jars.

spark-submit .\insta_Project\src\Insta_Data_Profiling.py .\insta_Project\src\config\config.ini

Once the enitre setup ready…please run the project from shell using this command

command run from shell..

Once the Code run finished connect the snowflake DB to PowerBI and start creating insights…

PowerBI things:-

4) Install and create Power BI account from Microsoft

a) once code run finish it will create tables in snowflake check if all tables created then your good to start your data analysis on Power bi
b) connect snowflake db to your Power BI. All tables that created in Snowflake will be imported to powerBI local.
c) now create a good background template of instagram of your own from any website. I personally use canvas to create a template.
d) now create some measures for some of the tables to get their overall metrics. Here i will try to attach some screenshots of those.
e) now put some cards, histogram charts, buttons to visualize your code.

This Code might look Clumsy but it serves the purpose.

Note:- If anyone has a better approach to generalizing the project code i’m happy to embed .

That’s all for now…Happy Learning….

Please clap and Subscribe/follow my profile…Don’t forget to Comment…

--

--

Sairamdgr8 -- An Aspiring Full Stack Data Engineer

Data Engineer @ AWS | SPARK | PYSPARK | SPARK SQL | enthusiast about #DataScience #ML Enthusiastic#NLP#DeepLearning #OpenCV-Face Recognition #ML deployment