Activity: Data wrangling in Python

Instructions:

Working in Python: To work in Python, you will connect to the DEAC OnDemand server:

https://sta279-f25.github.io/resources/rstudio_server/

Overview

Back to the Gapminder data

In this activity, we will revisit the Gapminder data that we have worked with previously. To load the Gapminder data into Python, run the following code in R:

library(gapminder)

Then run the following code in Python:

import pandas as pd
import numpy as np

gapminder = r.gapminder

Use Python and pandas to complete the following questions.

  1. Choose only the rows in the gapminder data for countries in Asia in 2002.

  2. Count the number of countries in each continent for the data in 2002. (hint: use the 'count' function in agg)

  3. Use the assign function to create a new column which contains the natural log of GDP per capita. (You may need to look up pandas documentation for the assign function).

  4. Create the following table:

           num_countries  mean_log_gdp
continent                             
Africa                52      7.367332
Americas              25      8.847365
Asia                  33      8.542181
Europe                30      9.808402
Oceania                2     10.191543