Skip to content

Homework 4

Instructions

In this assigment, the task is to analyse the number of cell phones in countries and how they relate to population size. The data is fraught with missing values and spread out into different files. Your task is to clean, combine and present the data.

Make sure of the following:

  • Your solutions should be in form of a report in .md format.
  • You have documented your procedure properly.
  • Your answers are clear and concise.

When you are finished push you results to Github and raise an issue, just as you have done in previous homeworks. To pass the homework you will have to complete the assigments below and also finish the peer-review.

Feel free to contact me if anything is unclear.

Data Preparation

Cleaning data

To begin with, take a look a the cell_phones_total.csv file. It contains data about the number of phones withtin countries over the years, from 1960-2019. You will find that some numbers are represented as string, where k=1e3, M=1e6 and B=1e9. The dataset also contains missing values. Clean the dataset. Make sure to:

  • Deal with missing values. Fill and/or remove missing values.
  • Convert all the relevant cells to numbers (not strings).

Each step has to be documented.

When you are finished preparing the data present a table of the following format: It does not have to exactly match but all the missing values should be handled. Here I have sorted the values by the year 2015.

iso-320152016201720182019
CHN1.29e+091.36e+091.47e+091.65e+091.73e+09
IND1e+091.13e+091.17e+091.18e+091.15e+09
USA3.82e+083.96e+084e+084.22e+08nan
IDN3.39e+083.86e+084.35e+083.19e+083.45e+08
BRA2.58e+082.44e+082.18e+082.07e+08nan

Hint: All the missing values should no be treated equally and you should not have to remove any data.

Combining Data

Now that we have cleaned our cell phones data, we want to analyse the number of cell phones per capita. To make this possible, we need to expand our original dataset. The file pop_data.csv contains data about the population size of countries over years. Once again there will be missing values. Deal with them before proceeding.

To combine the cell phone data with the population data, we need to reshape both datasets. This can be done by pivotting the tables into the right format. Remember the 3 rules from the previous lecture. Present a similar table to below the one below.

iso-3yearn_cellphonespopulation
ABW1960054608
AFG196008.62247e+06
AGO196005.3572e+06
ALB196001.6088e+06
AND196009443

The iso-3 format is nice to work with but for presentation it is better to work with real names of the countries. A mapping between iso-3 and country name can be found in the country_data.csv file.

Combine the cell phone, population and country data into a single table. Calcualte the number of phones per capita. I.e n_cellphones/population_size. Present a similar table to below. Here, the coutries are the ones with highest number of cell phones per capita in 2019.

yearMacaoHong KongUnited Arab EmiratesAntigua and BarbudaSeychelles
20153.088232.29042.007421.956841.58426
20163.143512.398932.212521.987541.60546
20173.21012.488772.183432.019341.74243
20183.348742.670212.199082.008161.8499
20193.284852.863652.127741.997461.9872

Analyse the country with highest growth rate of cell phones (per capita) in the last 10 years. That is, create a plot that illustrates the growth rate of the countries in the table above over the last 10 years.

Hint: You can pivot the tables to get the desired output.