Intro to Python

What is R?

R is a programming language specifically designed for statistics and data analysis
- Objects for storing data, and functions for interacting with data, are fundamental
- R is very good at graphics and visualization
- R is easily extended. Users can write and share their own functions and packages
We can interact with R through IDEs like RStudio

What other options exist?

SAS
Stata
SPSS
Excel
Python
Julia
Matlab
Many others…

What is Python

Python is a general-purpose programming language
Like R, python has a wide range of packages to extend functionality
Certain Python packages allow for sophisticated data analysis and modeling
- SciPy, NumPy
- scikit-learn, statsmodels, pytorch
- pandas
- matplotlib

R vs. Python

My own, personal, preferences:

R is good for

Data visualization and wrangling
Classical statistics
Statistical inference
New statistical methods

Python is good for

General-purpose programming
Challenging data types (e.g. images)
Prediction and machine learning

Warmup

Work on the warmup activity (handout).

A taste of Python

import numpy as np

M = 10 
hats = np.arange(M) 
nsim = 10000 
results = np.zeros(nsim) 

for i in range(nsim):
    randomized_hats = np.random.choice(hats, M, replace = False)
    results[i] = np.sum(randomized_hats == hats) > 0

np.mean(results)

What is this code doing?
What similarities and differences do you notice, compared to R?

A taste of Python

Recall our code from a previous class:

M <- 10 # number of people at the party
hats <- 1:M # numbered hats
nsim <- 10000 # number of simulations
results <- rep(0, nsim) # vector to store the results

for(i in 1:nsim){
  randomized_hats <- sample(hats, M, replace = FALSE)
  results[i] <- sum(randomized_hats == hats) > 0
}

mean(results)

A taste of Python

Here is the same code, written in Python

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats
nsim = 10000 # number of simulations
results = np.zeros(nsim) # to store the results

for i in range(nsim):
    randomized_hats = np.random.choice(hats, M, replace = False)
    results[i] = np.sum(randomized_hats == hats) > 0

np.mean(results)

Step 1: representing the hats

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats

hats

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

hats[0]

np.int64(0)

hats[1]

np.int64(1)

hats is a 1-dimensional array (similar to a vector in R)
Python is 0-indexed: the first entry is hats[0]

Step 2: everyone draws a random hat

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats

randomized_hats = np.random.choice(hats, M, replace = False)

randomized_hats

array([7, 6, 1, 0, 9, 2, 8, 5, 4, 3])

np.random.choice works like R’s sample function
Booleans in Python are True and False (as opposed to TRUE and FALSE, or T and F)

Step 3: check who got their original hat

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats

randomized_hats = np.random.choice(hats, M, replace = False)
randomized_hats

array([0, 3, 1, 5, 8, 7, 6, 4, 9, 2])

randomized_hats == hats

array([ True, False, False, False, False, False,  True, False, False,
       False])

np.sum(randomized_hats == hats)

np.int64(2)

NumPy arrays allow for “vectorized” operations, like in R

Step 4: iteration

import numpy as np

M = 10 # number of people at the party
hats = np.arange(M) # numbered hats
nsim = 10000 # number of simulations
results = np.zeros(nsim) # to store the results

for i in range(nsim):
    randomized_hats = np.random.choice(hats, M, replace = False)
    results[i] = np.sum(randomized_hats == hats) > 0

np.mean(results)

range(nsim) is similar to 1:nsim in R
We don’t use the curly braces { }. Instead we use whitespace (four spaces is standard, you just have to be consistent)

Using Python through RStudio

You can make Python chunks in Quarto documents, just like R chunks:

```{python}

```

Class activity

Work on the class activity on the course website. You do not need to submit anything for this activity.