x <- c(0, 1, "a")
x # the numbers get coerced into characters[1] "0" "1" "a"
Instructions:
Working in Python: To work in Python, you will connect to the DEAC OnDemand server:
https://sta279-f25.github.io/resources/rstudio_server/
In this class activity, we will continue exploring Python fundamentals. When we first introduced Python, we got a glimpse of arrays (similar to vectors in R) and loops. In this activity, we will see arrays, lists, loops, and if-else statements.
Recall vectors in R. R vectors contain only one type:
x <- c(0, 1, "a")
x # the numbers get coerced into characters[1] "0" "1" "a"
Many operations work element-wise:
x <- c(0, 1, 2)
y <- c(1, 2, 3)
x + y[1] 1 3 5
x == y[1] FALSE FALSE FALSE
x + 1[1] 1 2 3
And many functions are (or can be) vectorized:
x <- c(0, 1, 2)
sqrt(x)[1] 0.000000 1.000000 1.414214
A 1-dimensional NumPy array functions very similarly to a vector in R. They can contain only one type:
import numpy as np
x = np.array([0, 1, "a"])
xarray(['0', '1', 'a'], dtype='<U21')
Many operations work element-wise:
x = np.array([0, 1, 2])
y = np.array([1, 2, 3])
x + yarray([1, 3, 5])
x == yarray([False, False, False])
x + 1array([1, 2, 3])
And many functions are “vectorized”:
x = np.array([0, 1, 2])
np.sqrt(x)array([0. , 1. , 1.41421356])
Indexing a 1-d NumPy array is very similar to indexing a vector in R. Consider the following R code:
x <- c(1, 5, 9)
x[2] # select the second entry[1] 5
x[1:2] # select the first and second entries[1] 1 5
x[c(1, 3)] # select the first and third entries[1] 1 9
x = np.array([1, 5, 9])In R, I can subset a vector using a vector of booleans. For example, consider:
x <- c(11, 23, 49)
x[c(TRUE, FALSE, TRUE)][1] 11 49
That is, I keep only the entries in x where the boolean vector was TRUE. Now, the boolean vector can be created however we want – often based on the values in a second vector. For example, the following code creates two vectors called x and y, and keeps only the entries in x where y is > 3:
x <- c(23, 45, 5, 11)
y <- c(0, 4, -1, 20)
x[y > 3][1] 45 11
Or, I can subset a vector based on a logical statement about itself:
x <- sample(1:100, 10, replace=F)
x [1] 5 67 6 73 31 69 21 1 97 7
x[x < 50][1] 5 6 31 21 1 7
np.random.choice function to take a random sample, and the np.arange function:np.arange(4) # 0 to 3array([0, 1, 2, 3])
np.arange(1, 5) # 1 to 4array([1, 2, 3, 4])
Arrays and vectors don’t always behave the same. One importanat example occurs when we use an negative number to index. In R, this will return a subset of the vector:
x <- c(4, 5, 6)
x[-1][1] 5 6
In Python, the indexing loops back around!
x = np.array([4, 5, 6])
x[-1]np.int64(6)
A list in R is a hierarchical structure that can contain objects of multiple types. For example, in the following R code, x is a list which contains a vector and another list. We can pull out different pieces of this list:
x <- list(c("a", "b"), list(1, 2, c(4, 5)))
x[[1]][1] "a" "b"
x[[2]][[3]][1] 4 5
Like lists in R, lists in Python allow us to hold objects of multiple types. Indexing is similar to R, but uses single brackets [ ] instead of double brackets.
x = ["a", 0, 1]
x[0]'a'
x[1] + 11
x = [np.array(["a", "b"]), [1, 2, np.array([4, 5])]]
x[1][1, 2, array([4, 5])]
x[1][2]array([4, 5])
In R, we can’t do arithmetic operations on a list:
x <- list(0, 1, 2)
x + 1Error in x + 1: non-numeric argument to binary operator
x * 2Error in x * 2: non-numeric argument to binary operator
The same is sort of true in Python. In Python, “addition” between two lists concatenates them (note that both objects have to be lists!):
x = [10, 11, 12]
x + [1][10, 11, 12, 1]
while “multiplication” of a list repeats it:
x = [10, 11, 12]
x * 2[10, 11, 12, 10, 11, 12]
Here is a very small example of an if...else... in R. If x is odd, we print "odd", otherwise we print "even":
x <- 3
if(x %% 2 == 1){
print('odd')
} else {
print('even')
}[1] "odd"
We can also add other conditions with else if blocks:
x <- "a"
if(!is.numeric(x)){
print('not a number')
} else if(x %% 2 == 1) {
print('odd')
} else {
print('even')
}[1] "not a number"
Here is the equivalent code in Python; just like with loops, Python uses white space to denote which lines go inside the if and else blocks:
x = 3
if x % 2 == 1:
print("odd")
else:
print("even")odd
Instead of the ! symbol to denote negation, Python uses the word not:
x = 'a'
if not type(x) in [int, float]:
print('not a number')
elif x % 2 == 1:
print("odd")
else:
print("even")not a number
Sometimes, if...else... statements can be written in a single line:
x = 3
y = 'odd' if x % 2 == 1 else 'even'
print(y)odd
Generally, logical statements in Python are similar to logical statements in R. The syntax can be slightly different for and and or statements:
x = 3
y = 5
x < 3 or y == 5True
x == 3 and y == 4False
Unlike R, we can also chain inequalities:
0 < 2 < 3 < 4True
Finally, element-wise comparison of NumPy arrays uses & and |, instead of and and or:
x = np.array([0, 3, 4])
y = np.array([2, 3, 4])
(x < 3) & (y < 4) # the parentheses are important herearray([ True, False, False])
(x < 3) | (y < 4)array([ True, True, False])
Now, let’s put our Python knowledge together to write a simulation! Previously, we created a simulation for the following gambling scenario:
Here is the R code we wrote for this simulation:
set.seed(279)
nsim <- 1000
results <- rep(0, nsim)
wheel <- c(rep("green", 2), rep("black", 18), rep("red", 18))
for(i in 1:nsim){
money <- 50 # starting money
while(money > 0 & money < 100){
spin <- sample(wheel, size = 1)
if(spin == "red"){
money <- money + 1
} else {
money <- money - 1
}
}
results[i] <- money == 100
}
mean(results)np.random.seed function can be used to set a seednp.zeros function can be used to create an array of 0sIf you’re done early, try to re-write the movie theater simulation in Python. Here is the R code:
set.seed(111)
n_people <- 100 # number of people in the theater
nsim <- 1000 # number of simulations to estimate probability
results <- rep(0, nsim)
seats <- 1:n_people
for(i in 1:nsim){
# vector to store which seats are taken
# taken[i] is 0 when the seat is free
taken <- rep(0, n_people)
# first person randomly chooses a seat
choice <- sample(seats, 1)
taken[choice] <- 1
# now go through everyone else (except the last person)
for(j in 2:(n_people - 1)){
# if the seat is free, take their seat. Otherwise,
# randomly choose a seat from the ones available
choice <- ifelse(taken[j] == 0, j,
sample(seats[taken == 0], 1))
taken[choice] <- 1
}
results[i] <- taken[n_people]
}
mean(results)