06 March 2015

Exploring the structure of national consumption

This entry is a direct continuation of my first exploration of the structure of national resource consumption

library(ggplot2)
# READING IN DATA

## SETTING DIRECTORY FOR EORA DATA ON LOCAL HARD DRIVE
wd<-"G:/Documents/PostDocKVA/Data/Eora" ### data directory
setwd(wd)
dir()
##  [1] "countries.csv"              "country_lookup.csv"        
##  [3] "Eora26_2011_bp.zip"         "Eora26Structure.xlsx"      
##  [5] "gdppop.csv"                 "regionmembership.csv"      
##  [7] "TradeBalance_I-ENERGY.csv"  "TradeBalance_I-ENERGY.xlsx"
##  [9] "TradeBalance_I-VA.csv"      "TradeBalance_I-VA.xlsx"    
## [11] "Wiedmann"
## READING IN DATA
### MATERIAL USE DATA - ENERGY DATASET
energy.df<-read.csv("TradeBalance_I-ENERGY.csv",header=TRUE)
### Reading in .csv file with annual gdp and population sizes
gdppop.df<-read.csv("gdppop.csv",header=TRUE,skip=1) #skipping the first line which includes a description of the file

## REMOVING NEGATIVE AND ZERO CONSUMPTION ENTRIES
energy.df<-energy.df[which(energy.df[,"Consumption"]>0),]

## REMOVING NEGATIVE AND ZERO CONSUMPTION ENTRIES
energy.df<-energy.df[-which(as.character(energy.df$Country)=="Former USSR"),]


## merging the gdp and population size data onto the energy consumption data frame
energy.df<-merge(energy.df,gdppop.df,by=c("CountryA3","y","Country"),all.x=TRUE)


## To make consumption more comparable let's calculate per capita consumption by associating population data

### calculate per capita consumption and gdp consumption intensity by associating population data
energy.df[,"Consum.pop.int"]<-energy.df[,"Consumption"]/energy.df[,"val"]
energy.df[,"Consum.gdp.int"]<-energy.df[,"Consumption"]/energy.df[,"GDP"]

Picking up where we left off.

## visualizing per captia consumption and the GDP efficiency of consumption 
### percapita consumption

ggplot(energy.df,aes(y=Consum.pop.int,x=y,group=CountryA3)) + geom_line()
## Warning: Removed 320 rows containing missing values (geom_path).

ggplot(energy.df,aes(y=Consum.gdp.int,x=y,group=CountryA3)) + geom_line()

We immediately see that some time series contain one or more years with abnormal fluctuations. These anomalies are unlikely to reflect actual changes in the structure of consumption, but could instead be due to sudden changes in accounting methods. This is one of the main limitations of using accounting statistics to estimate consumption. Now let’s take a look at the countries and years that exhibit large anomalies.

Just by looking at the plots above we see that many of the per capita consumption anomalies occur in 1991.

energy.df[order(energy.df$Consum.pop.int, decreasing=TRUE)[1:20],c("Country","y","Consum.pop.int")]
##                     Country    y Consum.pop.int
## 6392             San Marino 1991       5.676499
## 6261              Singapore 1975       5.469364
## 7568 British Virgin Islands 1991       3.648615
## 4431                 Monaco 1991       3.485693
## 1870         Cayman Islands 1991       3.416776
## 1030                Bermuda 1991       3.220180
## 4095          Liechtenstein 1991       3.176298
## 6260              Singapore 1974       2.641860
## 6476                 Serbia 1991       2.545683
## 2980                 Guyana 2010       2.480164
## 5962                  Qatar 1970       2.327962
## 253                     UAE 1970       2.189188
## 2978                 Guyana 2008       2.189014
## 5963                  Qatar 1971       2.177633
## 2979                 Guyana 2009       2.133004
## 254                     UAE 1971       2.059360
## 255                     UAE 1972       2.020770
## 5964                  Qatar 1972       2.003712
## 2977                 Guyana 2007       1.942043
## 22                    Aruba 1991       1.937733

It looks like many of the countries that exhibit per capita consumption anomalies are characterized by having a small area and population and small territorial emissions (i.e. domestic extraction of resources).

energy.df[order(energy.df$Consum.gdp.int, decreasing=TRUE)[1:20],c("Country","y","Consum.gdp.int")]
##          Country    y Consum.gdp.int
## 6560       Sudan 1991      53.764571
## 948      Belarus 1993      32.356232
## 4475     Moldova 1993      18.520411
## 4482     Moldova 2000      13.564694
## 6193 South Sudan 1991      12.848089
## 4481     Moldova 1999      11.373931
## 4474     Moldova 1992      10.666067
## 4484     Moldova 2002      10.644795
## 4480     Moldova 1998      10.634448
## 4485     Moldova 2003      10.523747
## 4486     Moldova 2004      10.270041
## 4479     Moldova 1997      10.089988
## 4483     Moldova 2001       9.791289
## 4487     Moldova 2005       9.641692
## 6539       Sudan 1970       9.358367
## 4478     Moldova 1996       8.784872
## 955      Belarus 2000       8.776421
## 4477     Moldova 1995       8.628466
## 4488     Moldova 2006       8.510303
## 949      Belarus 1994       8.374650

High GDP consumption intensity on the other hand, seem to be limited to a smaller number of countries including notably Moldova but also Sudan, South Sudan and Belarus.

For a more proper investigation of aberant consumption anomalies will use scaling of national time series with mean 0 and sd 1.

energy.df<-energy.df[order(energy.df[,"Country"],energy.df[,"y"]),]
head(energy.df)
##    CountryA3    y     Country TerritorialEmissions Imports Exports
## 43       AFG 1970 Afghanistan               115043   15343    1343
## 44       AFG 1971 Afghanistan               115043   13345    1262
## 45       AFG 1972 Afghanistan               115043   11725    1465
## 46       AFG 1973 Afghanistan               115043    9633    1392
## 47       AFG 1974 Afghanistan               115043    8265    1213
## 48       AFG 1975 Afghanistan               115043    7828    1150
##    DirectEmissions Consumption     GDP      val Consum.pop.int
## 43           43301      129043 1277935 11839729    0.010899151
## 44           43301      127126 1362663 12138578    0.010472891
## 45           43301      125303 1168728 12449180    0.010065161
## 46           43301      123284 1266842 12760486    0.009661388
## 47           43301      122094 1567124 13058067    0.009350082
## 48           43301      121720 1722525 13328589    0.009132249
##    Consum.gdp.int
## 43     0.10097775
## 44     0.09329233
## 45     0.10721314
## 46     0.09731600
## 47     0.07790960
## 48     0.07066371
energy.df[,"Consum.pop.int.scale"]<-unlist(by(energy.df,energy.df[,"Country"], function(x) scale(x[,"Consum.pop.int"],center=TRUE,scale=TRUE)))
energy.df[,"Consum.gdp.int.scale"]<-unlist(by(energy.df,energy.df[,"Country"], function(x) scale(x[,"Consum.gdp.int"],center=TRUE,scale=TRUE)))

ggplot(energy.df,aes(y=Consum.pop.int.scale,x=y,group=CountryA3)) + geom_line()
## Warning: Removed 320 rows containing missing values (geom_path).

energy.df[order(energy.df$Consum.pop.int.scale,decreasing=TRUE)[1:20],c("Country","y","Consum.pop.int.scale")]
##                    Country    y Consum.pop.int.scale
## 4179               Lesotho 1991             6.198724
## 22                   Aruba 1991             6.159096
## 694           Burkina Faso 1991             6.140783
## 1744            Cape Verde 1991             6.136408
## 7694                 Samoa 1991             6.136137
## 4431                Monaco 1991             6.105693
## 6812            Seychelles 1991             6.101615
## 6518 Sao Tome and Principe 1991             6.086875
## 400                Antigua 1991             6.081820
## 4095         Liechtenstein 1991             6.080496
## 4808            Montenegro 1991             6.063593
## 6109                Rwanda 1991             6.057820
## 4682                  Mali 1991             6.055243
## 1030               Bermuda 1991             6.051178
## 4011               Liberia 1991             6.028471
## 988                 Belize 1991             6.009964
## 2877             Greenland 1991             5.906676
## 3647                 Japan 2005             5.887532
## 7652               Vanuatu 1991             5.823841
## 4557              Maldives 1991             5.796995
ggplot(energy.df,aes(x=Consum.pop.int.scale))+geom_histogram()+facet_wrap(~y)

The annual histograms of scaled per capita consumption shows that 1991 indeed is a weird year compared to its neighbouring years. Similarly, the year 2000 looks to have a long rights-skewed tail to its frequency distribution.

How about economic intensity of consumption?

ggplot(energy.df,aes(y=Consum.gdp.int.scale,x=y,group=CountryA3)) + geom_line()

energy.df[order(energy.df$Consum.gdp.int.scale,decreasing=TRUE)[1:20],c("Country","y","Consum.gdp.int.scale")]
##                    Country    y Consum.gdp.int.scale
## 6193           South Sudan 1991             6.162223
## 6560                 Sudan 1991             6.091356
## 4011               Liberia 1991             6.066484
## 6518 Sao Tome and Principe 1991             5.811309
## 948                Belarus 1993             5.561503
## 7694                 Samoa 1991             5.496875
## 6434               Somalia 1991             5.412538
## 4179               Lesotho 1991             5.225550
## 6602              Suriname 1991             4.830266
## 2982             Hong Kong 1970             4.536611
## 7736                 Yemen 1991             4.416178
## 694           Burkina Faso 1991             4.402034
## 2857             Greenland 1970             4.311507
## 1744            Cape Verde 1991             4.292522
## 4032                 Libya 1970             4.222122
## 6261             Singapore 1975             4.195786
## 1849        Cayman Islands 1970             4.152875
## 967                 Belize 1970             4.132067
## 3402               Iceland 1970             4.117356
## 6896                  Chad 1991             4.098158
ggplot(energy.df,aes(x=Consum.gdp.int.scale))+geom_histogram()+facet_wrap(~y)

Again, 1991 is abnormal compared to the years immediately before and after it. It also looks like very few countries have population data for 2011.

For now, and to avoid the abormal fluctuations in the consumption intensities in 1991 and 2000, I will remove 1991 and 2000 from the dataset. I will also remove 2011.

I will defnitely need to return to these two years to better understand how energy consumption relates to the size of population and economy in these years.

energy.df<-energy.df[-which( energy.df[,"y"] %in% c(1991,2000,2011)),]

ggplot(energy.df,aes(y=Consum.pop.int.scale,x=y)) + geom_line(aes(group=CountryA3))
## Warning: Removed 135 rows containing missing values (geom_path).

ggplot(energy.df,aes(y=Consum.gdp.int.scale,x=y)) + geom_line(aes(group=CountryA3))