Predictive Analysis on Climate Change and Impacts in Africa
- Michael Olaniyi Jeremiah

- Sep 21, 2024
- 7 min read
Updated: Aug 15
By Michael Michael Jeremiah


Overview
According to the United Nations, Climate change refers to long-term shifts in temperatures and weather patterns. Such shifts can be natural, due to changes in the sun’s activity or large volcanic eruptions. But since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil, and gas. This project seeks to find the CO2 levels (at each African region) in the year 2025, also to determine if `CO2` levels affect annual `temperature` in the selected African countries.
Problem
The consequences of climate change now include, among others, intense droughts, water scarcity, severe fires, rising sea levels, flooding, melting polar ice, catastrophic storms, and declining biodiversity.
Analysis Conclusion Summary
In conclusion, the regression model provides insights into how CO2 levels and the countries' locations in Africa impact annual temperatures. It suggests that higher CO2 levels are associated with higher temperatures.
Dataset
The dataset, IEA-EDGAR CO2, is a component of the EDGAR (Emissions Database for Global Atmospheric Research) Community GHG database version 7.0 (2022) including or based on data from IEA (2021) Greenhouse Gas Emissions from Energy, www.iea.org/statistics, as modified by the Joint Research Centre. The data source was the EDGARv7.0_GHG website provided by Crippa et. al. (2022) and with DOI.
The dataset contains three sheets - IPCC 2006, 1PCC 1996, and TOTALS BY COUNTRY on the amount of CO2 (a greenhouse gas) generated by countries between 1970 and 2021.
You can download the dataset from your workspace or inspect the dataset directly here.
TOTALS BY COUNTRY SHEET
This sheet contains the annual CO2 (kt) produced between 1970 - 2021 in each country. The relevant columns in this sheet are:

IPCC 2006
These sheets contain the amount of CO2 by country and the industry responsible.

Table of contents
Clean and tidy the datasets.
Create a line plot to show the trend of `CO2` levels across the African regions.
Determine the relationship between time (`Year`) and `CO2` levels across the African regions.
Determine if there is a significant difference in the `CO2` levels among the African Regions.
Determine the most common (top 5) industries in each African region.
Determine the industry responsible for the most amount of CO2 (on average) in each African Region.
Predict the `CO2` levels (at each African region) in the year 2025.
Determine if `CO2` levels affect annual `temperature` in the selected African countries.
Setup of all required library
# Setup
import pandas as pd
import numpy as np
import pingouin
from sklearn.linear_model import LinearRegression
from statsmodels.regression.linear_model import OLS
import seaborn as sns
import matplotlib.pyplot as plt
import inspect
plt.style.use('ggplot')
# The sheet names containing our datasets
sheet_names = ['IPCC 2006', 'TOTALS BY COUNTRY']
# The column names of the dataset starts from rows 11
# Let's skip the first 10 rows
datasets = pd.read_excel('IEA_EDGAR_CO2_1970-2021.xlsx', sheet_name = sheet_names, skiprows = 10)
# we need only the African regions
african_regions = ['Eastern_Africa', 'Western_Africa', 'Southern_Africa', 'Northern_Africa']
ipcc_2006_africa = datasets['IPCC 2006'].query('C_group_IM24_sh in @african_regions')
totals_by_country_africa = datasets['TOTALS BY COUNTRY'].query('C_group_IM24_sh in @african_regions')Read the temperatures datasets containing four African countries One from each African Region:
Nigeria: West Africa
Ethiopia : East Africa
Tunisia: North Africa
Mozambique: South Africa
temperatures = pd.read_csv(r"C:\Users\Hp g6\Downloads\Data Info\Predictive Analysis on Climate Change and Impacts in Africa\temperatures.csv")Tasks 1: Clean and tidy the datasets
Rename C_group_IM24_sh to Region, Country_code_A3 to Code, and ipcc_code_2006_for_standard_report_name to Industry in the corresponding African datasets.
Drop IPCC_annex, ipcc_code_2006_for_standard_report, and Substance from the corresponding datasets.
Melt Y_1970 to Y_2021 into a two columns Year and CO2. Drop rows where CO2 is missing.
Convert Year to int type.
# Rename columns
ipcc_2006_africa.rename(columns={'C_group_IM24_sh':'Region','Country_code_A3':'Code','ipcc_code_2006_for_standard_report_name':'Industry'}, inplace=True)
# Drop columns
ipcc_2006_africa = ipcc_2006_africa.drop(['IPCC_annex','ipcc_code_2006_for_standard_report','Substance'],axis=1)
# Melt data
ipcc_2006_africa=pd.melt(ipcc_2006_africa, id_vars = ipcc_2006_africa.iloc[:, 0:5])
# Rename variable and values
ipcc_2006_africa.rename(columns={'variable':'Year','value':'CO2'},inplace=True)
# Split strings Y_
ipcc_2006_africa['Year']= ipcc_2006_africa['Year'].str.split('_').str[-1]
# Change Year data type to intger
ipcc_2006_africa['Year']= ipcc_2006_africa.Year.astype(int)
# Drop row with missing values
ipcc_2006_africa = ipcc_2006_africa.dropna(subset=['CO2'])
# Check output
ipcc_2006_africaoutput: Cleaned data

# Rename columns
totals_by_country_africa.rename(columns={'C_group_IM24_sh':'Region','Country_code_A3':'Code'}, inplace=True)
# Drop columns
totals_by_country_africa = totals_by_country_africa.drop(['IPCC_annex','Substance'],axis=1)
# Melt dataset
totals_by_country_africa=pd.melt(totals_by_country_africa, id_vars = totals_by_country_africa.iloc[:, 0:3])
# Remane variable and values
totals_by_country_africa.rename(columns={'variable':'Year','value':'CO2'},inplace=True)
# Split strings
totals_by_country_africa['Year']= totals_by_country_africa['Year'].str.split('_').str[-1]
# Change data type
totals_by_country_africa['Year']= totals_by_country_africa.Year.astype(int)
# Drop row with missing values
totals_by_country_africa = totals_by_country_africa.dropna(subset=['CO2'])
totals_by_country_africaOutput: Cleaned totals by country Africa

Tasks 2: Show the trend of `CO2` levels across the African regions
Using totals_by_country_africa, create a line plot of CO2 vs. Year in each Region to show the trend of CO2 levels by year.
# line chart
sns.set_style('darkgrid')
sns.lineplot(x='Year',y='CO2', data=totals_by_country_africa, hue='Region', ci=None)
plt.ylabel('CO2 (kt)')
plt.ylim([0,110000])
plt.title('CO2 levels across the African Regions between 1970 and 2021');Output: CO2 levels across the African Regions between 1970 and 2021

Tasks 3: Determine the relationship between time (`Year`) and `CO2` levels across the African regions
Using the totals_by_country_africa dataset, conduct a Spearman's correlation to determine the relationship between time (Year) and CO2 within each African Region.
Save the results in a variable called relationship_btw_time_CO2.
# Group relationship_btw_time_CO2 by region
relationship_btw_time_CO2 = totals_by_country_africa.groupby('Region').corr(method='spearman')
relationship_btw_time_CO2 Output: Relationship between time CO2 by region

Tasks 4: Determine if there is a significant difference in the CO2 levels among the African Regions
Using totals_by_country_africa, to conduct an ANOVA using pingouin.anova() on the CO2 by Region. Saved the results as aov_results.
Conducted a posthoc test (with Bonferroni correction) using pingouin.pairwise_tests() to find the source of the significant difference. Saved the results as pw_ttest_result.
Is it true that the CO2 levels of the Southern_Africa and Northern_Africa region do not differ significantly? The previous task should provide you with the answer.
# Analysis of Variance (ANOVA)
aov_results = pingouin.anova(dv='CO2', data=totals_by_country_africa, between='Region')
# Pairwise t-tests with Bonferroni correction
pw_ttest_result = pingouin.pairwise_ttests(dv='CO2', data=totals_by_country_africa, between='Region', padjust="bonf").round(3)
# Results of pairwise t-tests
pw_ttest_resultOutput: Analysis of Variance (ANOVA)

Tasks 5: Determine the most common (top 5) industries in each African region.
Group the ipcc_2006_africa data by Region and Industry.
Count the occurrences of each Industry within each Region and name it Count.
Sort the data within each region group by `Count` in descending order
Get the top 5 industries for each region
save it to variable `top_5_industries` for each region.
# Count the occurrences of each combination of 'Region' and 'Industry'
count = ipcc_2006_africa.groupby(['Region', 'Industry']).size().reset_index(name='Count')
# Sort the counts in descending order within each region
count = count.sort_values(['Region', 'Count'], ascending=False)
# Top 5 industries with the highest counts within each region
top_5_industries = count.groupby('Region').head().reset_index(drop=True)
top_5_industriesOutput: number of occurrences of each combination of 'Region' and 'Industry'

Tasks 6: Determine the industry responsible for the most amount of CO2 (on average) in each African Region
Group the ipcc_2006_africa data by Region and Industry.
Calculate the average CO2 emissions for each group.
Find the Industry with the maximum average CO2 emissions in each region.
# Calculate the average CO2 emissions for each combination of 'Region' and 'Industry'
average = ipcc_2006_africa.groupby(['Region', 'Industry']).CO2.mean().reset_index()
# Sort the averages in descending order within each region
average = average.sort_values(['Region', 'CO2'], ascending=[True, False])
# Select the top industry with the highest average CO2 emissions within each region
top_5_industries = average.groupby('Region').head(1).reset_index(drop=True)
# Check result
top_5_industriesOutput: Average CO2 emissions for each combination of 'Region' and 'Industry'

# Calculate average CO2 emissions for each 'Region' and 'Industry' combination
average = ipcc_2006_africa.groupby(['Region', 'Industry'])['CO2'].mean().reset_index()
# Industries with maximum average CO2 emissions within each region
max_co2_industries = average.loc[average.groupby('Region')['CO2'].idxmax()].reset_index(drop=True)
# Check results
max_co2_industriesOutput: Average CO2 emissions for each 'Region' and 'Industry' combination

Tasks 7: Predict the `CO2` levels (at each African region) in the year 2025
Create an instance of LinearRegression() and save it as reg.
Fit a linear model of CO2 (in log base 10) by Year and Region using reg.fit().
Predict the values of CO2 using the reg.predict() and the data provided. Save the result as predicted_co2.
Convert predicted_co2 values from log base 10 to decimals and round to 2 d.p using np.round().
# Create a new DataFrame with 'Year' and 'Region'
newdata = pd.DataFrame({'Year': 2025, 'Region': african_regions})
# Convert categorical 'Region' column into one-hot encoded columns
newdata = pd.get_dummies(newdata)
# Linear Regression model
reg = LinearRegression()
# The logarithm of CO2 emissions
lin_fit = np.log10(totals_by_country_africa['CO2'])
# Converting 'Year' and 'Region' columns into one-hot encoded columns
features = pd.get_dummies(totals_by_country_africa[['Year', 'Region']])
# Fit the Linear Regression model using features and dependent variable
reg.fit(features, lin_fit)
# Predict CO2 emissions for the new data
predicted_co2 = reg.predict(newdata)
# Transform predicted values back to original scale using 10^x
predicted_co2 = np.round(10**predicted_co2, 2)Task 8: Determine if CO2 levels affect annual temperature in the selected African countries
Select Name, Year and CO2 of countries countries. Save the result as selected_countries.
Convert temperatures dataset from wide to long format. Set the new column names to Name and Temperature. Save the result as temp_long.
Perform an inner join between selected_countries and temp_long on Name and Year. Save the result as joined.
Create a linear model of Temperature by CO2 and Name. Save the result as model_temp.
A one unit rise in log10 CO2 leads to how many degrees rise in temperature? Run model_temp.summary() to find out!
What is the adjusted R squared value of the model?
# List of African countries
countries = ["Ethiopia", "Mozambique", "Nigeria", "Tunisia"]
# Select rows from totals_by_country_africa DataFrame
selected_countries = totals_by_country_africa[['Name', 'Year', 'CO2']][totals_by_country_africa['Name'].isin(countries)]
# Reshape temperature data from wide to long format
temp_long = temperatures.melt(id_vars=['Year'], value_vars=countries, var_name='Name', value_name='Temperature')
# Merge selected CO2 emissions and temperature data based on Name and Year
joined = pd.merge(selected_countries, temp_long, on=['Name', 'Year'], how='inner')
# Least Squares (OLS) regression for Temperature prediction
model_temp = OLS.from_formula("Temperature ~ np.log10(CO2) + Name", data=joined).fit()
# Show summary
model_temp.summary()Output: Model summary

Findings and Conclusion:
The linear regression model was developed to understand the relationship between CO2 levels (log10-transformed), the countries in Africa (Mozambique, Nigeria, Tunisia), and the annual temperatures in those countries. The analysis produced several key findings:
Coefficient Interpretation: The coefficients associated with each country indicate their impact on annual temperatures when compared to the reference category. Mozambique and Nigeria are associated with an increase in temperature, while Tunisia is associated with a decrease.
CO2 Impact: The coefficient for log10(CO2) suggests a positive and significant relationship between CO2 levels and temperature. An increase in log10-transformed CO2 levels is associated with higher temperatures.
Statistical Tests: The Omnibus test indicates that the overall fit of the model is not significant, suggesting that there may be other variables not considered in the model that also influence temperatures. The Durbin-Watson test suggests no autocorrelation in the residuals. The Jarque-Bera test suggests that the residuals are approximately normally distributed. Skew and kurtosis values are close to zero, indicating that the residuals have a distribution close to normal.
Analysis Conclusion Summary
In conclusion, the regression model provides insights into how CO2 levels and the countries' locations in Africa impact annual temperatures. It suggests that higher CO2 levels are associated with higher temperatures.
-----------------------------------------------------------------------------------------------------------------------
Thank you for taking the time to review this project. I welcome your comments, suggestions, and feedback.
For any project discussions or job opportunities, please feel free to contact me at:
Email Address: michaeljeremiah124@gmail.com
Phone: 234706664402
GitHub: https://github.com/mikeolani






Comments