Activity: Intro to Python

Instructions:

Work with a neighbor to answer the following questions
You do not need to submit anything for this activity

Working in Python: To work in Python, you will connect to the DEAC OnDemand server:

https://sta279-f25.github.io/resources/rstudio_server/

Overview

In this class activity, we will continue exploring Python fundamentals. When we first introduced Python, we got a glimpse of arrays (similar to vectors in R) and loops. In this activity, we will see arrays, lists, loops, and if-else statements.

1-dimensional NumPy arrays

Recap: vectors in R

Recall vectors in R. R vectors contain only one type:

x <- c(0, 1, "a")
x # the numbers get coerced into characters

[1] "0" "1" "a"

Many operations work element-wise:

x <- c(0, 1, 2)
y <- c(1, 2, 3)
x + y

[1] 1 3 5

x == y

[1] FALSE FALSE FALSE

x + 1

[1] 1 2 3

And many functions are (or can be) vectorized:

x <- c(0, 1, 2)
sqrt(x)

[1] 0.000000 1.000000 1.414214

NumPy arrays

A 1-dimensional NumPy array functions very similarly to a vector in R. They can contain only one type:

import numpy as np

x = np.array([0, 1, "a"])
x

array(['0', '1', 'a'], dtype='<U21')

Many operations work element-wise:

x = np.array([0, 1, 2])
y = np.array([1, 2, 3])
x + y

array([1, 3, 5])

x == y

array([False, False, False])

x + 1

array([1, 2, 3])

And many functions are “vectorized”:

x = np.array([0, 1, 2])
np.sqrt(x)

array([0.        , 1.        , 1.41421356])

Indexing arrays

Indexing a 1-d NumPy array is very similar to indexing a vector in R. Consider the following R code:

x <- c(1, 5, 9)
x[2] # select the second entry

[1] 5

x[1:2] # select the first and second entries

[1] 1 5

x[c(1, 3)] # select the first and third entries

[1] 1 9

Question: Try to re-write the R code above in Python. Remember that Python is 0-indexed! Here is the NumPy array to get you started:

x = np.array([1, 5, 9])

Indexing using logical statements

In R, I can subset a vector using a vector of booleans. For example, consider:

x <- c(11, 23, 49)
x[c(TRUE, FALSE, TRUE)]

[1] 11 49

That is, I keep only the entries in x where the boolean vector was TRUE. Now, the boolean vector can be created however we want – often based on the values in a second vector. For example, the following code creates two vectors called x and y, and keeps only the entries in x where y is > 3:

x <- c(23, 45, 5, 11)
y <- c(0, 4, -1, 20)
x[y > 3]

[1] 45 11

Or, I can subset a vector based on a logical statement about itself:

x <- sample(1:100, 10, replace=F)
x

 [1]  5 67  6 73 31 69 21  1 97  7

x[x < 50]

[1]  5  6 31 21  1  7

Question: Try to re-write the R code above in Python. Recall that you can use the np.random.choice function to take a random sample, and the np.arange function:

np.arange(4) # 0 to 3

array([0, 1, 2, 3])

np.arange(1, 5) # 1 to 4

array([1, 2, 3, 4])

An important difference between arrays and vectors

Arrays and vectors don’t always behave the same. One importanat example occurs when we use an negative number to index. In R, this will return a subset of the vector:

x <- c(4, 5, 6)
x[-1]

[1] 5 6

In Python, the indexing loops back around!

x = np.array([4, 5, 6])
x[-1]

np.int64(6)

Python lists

Recap: lists in R

A list in R is a hierarchical structure that can contain objects of multiple types. For example, in the following R code, x is a list which contains a vector and another list. We can pull out different pieces of this list:

x <- list(c("a", "b"), list(1, 2, c(4, 5)))
x[[1]]

[1] "a" "b"

x[[2]][[3]]

[1] 4 5

Lists in Python

Like lists in R, lists in Python allow us to hold objects of multiple types. Indexing is similar to R, but uses single brackets [ ] instead of double brackets.

x = ["a", 0, 1]
x[0]

'a'

x[1] + 1

x = [np.array(["a", "b"]), [1, 2, np.array([4, 5])]]
x[1]

[1, 2, array([4, 5])]

x[1][2]

array([4, 5])

An important difference between R and Python

In R, we can’t do arithmetic operations on a list:

x <- list(0, 1, 2)
x + 1

Error in x + 1: non-numeric argument to binary operator

x * 2

Error in x * 2: non-numeric argument to binary operator

The same is sort of true in Python. In Python, “addition” between two lists concatenates them (note that both objects have to be lists!):

x = [10, 11, 12]
x + [1]

[10, 11, 12, 1]

while “multiplication” of a list repeats it:

x = [10, 11, 12]
x * 2

[10, 11, 12, 10, 11, 12]

if…else.. statements

Recap: if…else… statements in R

Here is a very small example of an if...else... in R. If x is odd, we print "odd", otherwise we print "even":

x <- 3
if(x %% 2 == 1){
  print('odd')
} else {
  print('even')
}

[1] "odd"

We can also add other conditions with else if blocks:

x <- "a"
if(!is.numeric(x)){
  print('not a number')
} else if(x %% 2 == 1) {
  print('odd')
} else {
  print('even')
}

[1] "not a number"

In Python

Here is the equivalent code in Python; just like with loops, Python uses white space to denote which lines go inside the if and else blocks:

x = 3
if x % 2 == 1:
  print("odd")
else:
  print("even")

odd

Instead of the ! symbol to denote negation, Python uses the word not:

x = 'a'
if not type(x) in [int, float]:
  print('not a number')
elif x % 2 == 1:
  print("odd")
else:
  print("even")

not a number

Another way of writing the code

Sometimes, if...else... statements can be written in a single line:

x = 3
y = 'odd' if x % 2 == 1 else 'even'
print(y)

odd

More on logical conditions

Generally, logical statements in Python are similar to logical statements in R. The syntax can be slightly different for and and or statements:

x = 3
y = 5
x < 3 or y == 5

True

x == 3 and y == 4

False

Unlike R, we can also chain inequalities:

0 < 2 < 3 < 4

True

Finally, element-wise comparison of NumPy arrays uses & and |, instead of and and or:

x = np.array([0, 3, 4])
y = np.array([2, 3, 4])
(x < 3) & (y < 4) # the parentheses are important here

array([ True, False, False])

(x < 3) | (y < 4)

array([ True,  True, False])

Putting it all together

Now, let’s put our Python knowledge together to write a simulation! Previously, we created a simulation for the following gambling scenario:

A roulette wheel has 38 slots numbered 00, 0, and 1–36. Two are green, 18 are red, and 18 are black.
If a gambler bets based on color, the return on a $1 bet is $2
A gambler has $50, and will continuously bet $1 on red until they double their money (have $100) or lose the money they came with
What is the probability the gambler doubles their money?

Here is the R code we wrote for this simulation:

set.seed(279)

nsim <- 1000
results <- rep(0, nsim)
wheel <- c(rep("green", 2), rep("black", 18), rep("red", 18))

for(i in 1:nsim){
  money <- 50 # starting money

  while(money > 0 & money < 100){
    spin <- sample(wheel, size = 1)
    if(spin == "red"){
      money <- money + 1
    } else {
      money <- money - 1
    }
  }
  
  results[i] <- money == 100
}

mean(results)

Question: Re-write this simulation in Python. Some hints:
- The np.random.seed function can be used to set a seed
- The np.zeros function can be used to create an array of 0s
- Create the wheel using a Python list; remember that multiplying a list will repeat its entries, while adding lists together concatentates them

Another practice question…

If you’re done early, try to re-write the movie theater simulation in Python. Here is the R code:

set.seed(111)

n_people <- 100 # number of people in the theater
nsim <- 1000 # number of simulations to estimate probability
results <- rep(0, nsim)
seats <- 1:n_people

for(i in 1:nsim){
  # vector to store which seats are taken
  # taken[i] is 0 when the seat is free
  taken <- rep(0, n_people)
  
  # first person randomly chooses a seat
  choice <- sample(seats, 1)
  taken[choice] <- 1
  
  # now go through everyone else (except the last person)
  for(j in 2:(n_people - 1)){
    
    # if the seat is free, take their seat. Otherwise,
    # randomly choose a seat from the ones available
    choice <- ifelse(taken[j] == 0, j, 
                     sample(seats[taken == 0], 1))
    taken[choice] <- 1
  }
  
  results[i] <- taken[n_people]
}

mean(results)