Intro to Python

What is R?

  • R is a programming language specifically designed for statistics and data analysis
    • Objects for storing data, and functions for interacting with data, are fundamental
    • R is very good at graphics and visualization
    • R is easily extended. Users can write and share their own functions and packages
  • We can interact with R through IDEs like RStudio

What other options exist?

  • SAS
  • Stata
  • SPSS
  • Excel
  • Python
  • Julia
  • Matlab
  • Many others…

What is Python

  • Python is a general-purpose programming language
  • Like R, python has a wide range of packages to extend functionality
  • Certain Python packages allow for sophisticated data analysis and modeling
    • SciPy, NumPy
    • scikit-learn, statsmodels, pytorch
    • pandas
    • matplotlib

R vs. Python

My own, personal, preferences:

R is good for

  • Data visualization and wrangling
  • Classical statistics
  • Statistical inference
  • New statistical methods

Python is good for

  • General-purpose programming
  • Challenging data types (e.g. images)
  • Prediction and machine learning

Warmup

Work on the warmup activity (handout).

A taste of Python

import numpy as np

M = 10 
hats = np.arange(M) 
nsim = 10000 
results = np.zeros(nsim) 

for i in range(nsim):
    randomized_hats = np.random.choice(hats, M, replace = False)
    results[i] = np.sum(randomized_hats == hats) > 0

np.mean(results)
  • What is this code doing?
  • What similarities and differences do you notice, compared to R?

A taste of Python

Recall our code from a previous class:

M <- 10 # number of people at the party
hats <- 1:M # numbered hats
nsim <- 10000 # number of simulations
results <- rep(0, nsim) # vector to store the results

for(i in 1:nsim){
  randomized_hats <- sample(hats, M, replace = FALSE)
  results[i] <- sum(randomized_hats == hats) > 0
}

mean(results)

A taste of Python

Here is the same code, written in Python

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats
nsim = 10000 # number of simulations
results = np.zeros(nsim) # to store the results

for i in range(nsim):
    randomized_hats = np.random.choice(hats, M, replace = False)
    results[i] = np.sum(randomized_hats == hats) > 0

np.mean(results)

Step 1: representing the hats

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats

hats
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
hats[0]
np.int64(0)
hats[1]
np.int64(1)
  • hats is a 1-dimensional array (similar to a vector in R)
  • Python is 0-indexed: the first entry is hats[0]

Step 2: everyone draws a random hat

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats

randomized_hats = np.random.choice(hats, M, replace = False)

randomized_hats
array([7, 6, 1, 0, 9, 2, 8, 5, 4, 3])
  • np.random.choice works like R’s sample function
  • Booleans in Python are True and False (as opposed to TRUE and FALSE, or T and F)

Step 3: check who got their original hat

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats

randomized_hats = np.random.choice(hats, M, replace = False)
randomized_hats
array([0, 3, 1, 5, 8, 7, 6, 4, 9, 2])
randomized_hats == hats
array([ True, False, False, False, False, False,  True, False, False,
       False])
np.sum(randomized_hats == hats)
np.int64(2)
  • NumPy arrays allow for “vectorized” operations, like in R

Step 4: iteration

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats
nsim = 10000 # number of simulations
results = np.zeros(nsim) # to store the results

for i in range(nsim):
    randomized_hats = np.random.choice(hats, M, replace = False)
    results[i] = np.sum(randomized_hats == hats) > 0

np.mean(results)
  • range(nsim) is similar to 1:nsim in R
  • We don’t use the curly braces { }. Instead we use whitespace (four spaces is standard, you just have to be consistent)

Using Python through RStudio

  • You can make Python chunks in Quarto documents, just like R chunks:
```{python}

```

Class activity

Work on the class activity on the course website. You do not need to submit anything for this activity.