Performing Analysis of Meteorological Data



Data Analytics can give us plenty of information that can be used to analyze everyday weather conditions. Knowing accurate weather conditions is an important element for individuals as well as organizations. Many businesses rely on weather conditions.

It is necessary to have the correct data to get accurate decisions. One type of data that’s easier to find on the internet is Weather data. Many sites provide historical data on many meteorological parameters such as pressure, temperature, humidity, wind speed, visibility, etc.


Terminologies:-

Data analytics is a discipline focused on extracting insights from data. It comprises the processes, tools, and techniques of data analysis and management, including the collection, organization, and storage of data.


Meteorological Data:-

Data consisting of physical parameters that are measured directly by instrumentation, and include temperature, dew point, wind direction, wind speed, cloud cover, cloud layer(s), ceiling height, visibility, current weather, and precipitation amount.


Apparent Temperature:-

Apparent temperature is the temperature equivalent perceived by humans, caused by the combined effects of air temperature, relative humidity, and wind speed. The measure is most commonly applied to the perceived outdoor temperature.


Humidity:- 

Humidity is the amount of water vapor in the air.


Objective:-

The main objective is to perform data cleaning, perform analysis for testing the Influences of Global Warming on temperature and humidity, and finally put forth a conclusion.


Given Hypothesis:-

The Null Hypothesis H0 is “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”

The H0 means we need to find whether the average Apparent temperature for the month of a month says April starting from 2006 to 2016 and the average humidity for the same period have increased or not.

Dataset:-

The dataset can be obtained from Kaggle. The dataset has hourly temperature recorded for the last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in Northern Europe.

Source URL: https://www.kaggle.com/muthuj7/weather-dataset

-:Data Analysis:-

Import Libraries:-

We will be using several Python libraries such as Pandas, NumPy, seaborn, and matplotlib.

Load and Read Dataset

Load the dataset using read_csv() as the dataset is in CSV form and read the first 5 rows from data using head().


Dimensions of the data frame can be obtained using data.shape() as follows


The total number of rows and columns in the data set is 24508 and 12 respectively.

data.info() function is used to get a concise summary of the data frame. It comes in handy when doing exploratory analysis of the data.



Statistical details of the data frame can be using describe() function.


Check for any missing values using the below function



Change the ‘Formatted Date’ feature from String to Datetime using the datetime() function.

We can set “Formatted Date” as an index using the set_index() function which sets the DataFrame index (row labels) using one or more existing columns.

The resampled data is displayed using: data_monthly_mean.head() This function displays the first rows of the dataset. We will be using this dataset for testing the hypothesis.


Observation:-

There is a Linear Relation between “Apparent Temperature ” and “Humidity” with a negative slope. As air temperature increases, air can hold more water molecules, and its relative humidity decreases. When temperatures drop, relative humidity increases.

We use lineplot() function to plot the Variation of Apparent Temperature and Humidity with time.




Observation:-

The above graph displays average temperature and humidity for all 12 months over the 10 years i.e., from 2006 to 2016.

From the above plot,

  1. “Humidity” remained constant from 2006–2016
  2. “Apparent Temperature ” changed from 2006–2016 at regular intervals with constant amplitude.


For The specific month

Relation between Apparent Temperature & Humidity Using Regplot

We can use the regplot() function to plot the relationship between the “Apparent Temperature ” and “Humidity”.


Correlation between Apparent temperature & Humidity

Pair plot for correlation of Apparent temperature & Humidity

We can use the pairplot() function to plot the correlation of the “Apparent Temperature ” and “Humidity”.



We use a FacetGrid object with a scatterplot to plot the summary types of different weather conditions to Temperature and Humidity.

FacetGrid object takes a data frame as input and the names of the variables that will form the row, column, or hue dimensions of the grid. The variables should be categorical and the data at each level of the variable will be used for a facet along that axis.

Observation:-

  1. There are very few outliers.
  2. Mostly Weather is Clear or Partly Cloudy/Rainy in Finland.
  3. Only a few days there has a Light Rain or Dry or Dangerously Windy and Partly Cloudy.

Univariate Analysis

“Uni” means one and “Variate” means variable hence univariate analysis means analysis of one variable or one feature. Univariate tells us how data in each feature is distributed.
In the univariate analysis, we use histograms for analyzing and visualizing frequency distribution. Plotting histograms in pandas is very easy and straightforward.

Univariate Analysis For Apparent Temperature :-



Univariate Analysis For Humidity:-

Observation: “Humidity” is a better Feature than “Apparent Temperature ”

Function for plotting Humidity & Apparent Temperature for all months



This function helps to analyze the variations in Apparent Temperature and Humidity for all months over the 10 years.

This graph shows the changes in Temperature for each month from 2006 to 2016.

Function for plotting Humidity & Apparent Temperature for each month


This function helps to analyze the variations in Apparent Temperature and Humidity for each month over the 10 years.

The graphs below show the variations in Apparent Temperature and Humidity for each month from 2006 to 2016.

January

February
March
April
May
June
July
August
September
October
November
December


The above graphs show many ups and downs in the temperature and the average humidity has remained constant throughout the 10 years.


Conclusion:

From this analysis, We can conclude that the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming.

Source code: https://github.com/mukunddholariya/suvenconsultants/blob/main/Performing%20Analysis%20of%20Meteorological%20Data/Meteorological_Data.ipynb

I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you www.suvenconsultants.com



Comments

Popular posts from this blog