Intro to Iteration

Class activity

https://sta279-f25.github.io/class_activities/ca_11.html

Work with a neighbor on the class activity
We will spend the first portion of today on the activity, then we will discuss as a class
At the end of class, submit your work as an HTML file on Canvas (one per group, list all your names)

Iteration motivation

What are some potential issues with the following code?

read_csv("intro_stats_grades/section_1.csv") |>
  slr_slope(midterm_1, midterm_2)

read_csv("intro_stats_grades/section_2.csv") |>
  slr_slope(midterm_1, midterm_2)

read_csv("intro_stats_grades/section_3.csv") |>
  slr_slope(midterm_1, midterm_2)

`purrr::map`

grade_files <- list.files("intro_stats_grades", full.names=T)
grade_tables <- map(grade_files, read_csv)

What is the map function doing here?

`purrr::map`

grade_tables <- map(grade_files, read_csv)

map: apply a function to each element of a list or vector

first argument: a list or vector
- grade_files: a vector of CSV file names to read into R
second argument: the function to apply
- read_csv: function to read a CSV file into R

“For each file in grade_files, apply the read_csv function to read it into R”

`purrr::map`

grade_tables <- map(grade_files, read_csv)

(Image from Advanced R (2nd edition), Chapter 9)

`purrr::map`

grade_files <- list.files("intro_stats_grades", full.names=T)
grade_tables <- map(grade_files, read_csv)

map: apply a function to each element of a list or vector

Output: a list

typeof(grade_tables)

[1] "list"

length(grade_tables)

[1] 10

glimpse(grade_tables[[1]])

Rows: 35
Columns: 14
$ student_id <dbl> 55817, 32099, 40295, 54195, 15297, 81786, 49747, 78226, 102…
$ hw_1       <dbl> 10, 10, 10, 10, 10, 7, 10, 10, 9, 9, 8, 10, 10, 7, 8, 8, 10…
$ hw_2       <dbl> 10, 9, 10, 9, 8, 8, 9, 9, 9, 8, 10, 10, 10, 6, 9, 10, 8, 10…
$ hw_3       <dbl> 9, 10, 9, 9, 9, 6, 8, 9, 10, 10, 8, 9, 9, 9, 10, 9, 10, 8, …
$ hw_4       <dbl> 9, 9, 9, 6, 10, 6, 8, 10, 7, 9, 9, 10, 10, 9, 9, 8, 9, 10, …
$ hw_5       <dbl> 10, 10, 10, 9, 10, NA, 8, 9, 10, 9, NA, 10, 10, 4, 8, 10, 9…
$ hw_6       <dbl> 10, 9, 9, 9, 9, 6, 8, 10, 9, 9, 10, 10, 10, 8, NA, 9, 10, 1…
$ hw_7       <dbl> 10, 10, 9, 9, 10, 5, 6, 10, 8, 10, 8, 10, 10, 5, 7, 9, 9, 9…
$ hw_8       <dbl> 9, 10, 10, 9, 9, 7, 9, 9, 9, 10, 10, 10, 10, 8, 10, 9, 10, …
$ hw_9       <dbl> 8, 10, 10, 8, 10, 7, 7, 10, 10, 10, 8, 9, 9, 9, 8, 9, 10, 1…
$ midterm_1  <dbl> 97, 90, 95, 95, 94, 70, 79, 95, 89, 96, 90, 97, 88, 68, 86,…
$ midterm_2  <dbl> 96, 93, 91, 96, 92, 73, 83, 95, 84, 97, 87, 98, 93, 81, 88,…
$ final_exam <dbl> 93, 93, 97, 91, 93, 77, 77, 97, 88, 94, 89, 96, 98, 76, 85,…
$ project    <dbl> 93, 93, 97, 90, 87, 76, 84, 93, 89, 93, 82, 100, 93, 75, 89…

`purrr::map`

grade_files <- list.files("intro_stats_grades", full.names=T)
grade_tables <- map(grade_files, read_csv)

map: apply a function to each element of a list or vector

Output: a list

glimpse(grade_tables[[2]])

Rows: 29
Columns: 10
$ student_id <dbl> 88275, 99752, 81485, 34888, 56497, 14363, 31087, 34334, 278…
$ hw_1       <dbl> 8, 8, 10, 4, 5, 7, 5, 10, 10, 7, 6, 7, 7, 9, 9, NA, NA, 7, …
$ hw_2       <dbl> 6, 10, 9, 5, 8, 7, 8, 9, 10, NA, 8, 10, 8, NA, 10, 10, 8, 6…
$ hw_3       <dbl> 8, 10, 9, 6, 7, 10, 6, 7, 10, 10, 5, 10, 8, 7, 9, 8, 7, 8, …
$ hw_4       <dbl> 10, 10, 9, 9, 7, 9, 4, 8, 10, 7, 7, 8, 9, 9, 9, 9, NA, 7, 1…
$ hw_5       <dbl> 6, 7, 9, 7, 7, 9, 8, 8, 9, 6, 6, 8, 6, NA, 10, 8, 6, 5, 9, …
$ midterm_1  <dbl> 88, 84, 86, 68, 66, 85, 73, 67, 93, 72, 52, 85, 71, 81, 96,…
$ midterm_2  <dbl> 83, 88, 88, 70, 79, 84, 73, 64, 94, 74, 59, 90, 63, 88, 96,…
$ final_exam <dbl> 80, 88, 93, 58, 68, 82, 77, 69, 95, 71, 56, 82, 74, 90, 93,…
$ project    <dbl> 84, 87, 93, 61, 75, 81, 71, 73, 93, 65, 51, 87, 65, 89, 94,…

Another example

x <- c(1, 4, 9, 16, 25)
map(x, sqrt)

What will this code produce?

Another example

x <- c(1, 4, 9, 16, 25)
map(x, sqrt)

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

`map` variants

If we want to return a vector instead of a list, we can use one of the map variants. E.g.:

x <- c(1, 4, 9, 16, 25)
map_dbl(x, sqrt)

[1] 1 2 3 4 5

Another example

map_dbl(1:10, function(x) x + 1)

What will this code produce?

Another example

map_dbl(1:10, function(x) x + 1)

 [1]  2  3  4  5  6  7  8  9 10 11

Class activity

slr_slope <- function(df, x, y) {
  df |>
    summarize(slope = cov({{ x }}, {{ y }}, use="complete.obs")/
                var({{ x }}, na.rm=T))
}

list.files("intro_stats_grades", full.names=T) |>
  map(read_csv) |>
  map(slr_slope)

Error in `map()`:
ℹ In index: 1.
Caused by error in `summarize()`:
ℹ In argument: `slope = cov(, , use = "complete.obs")/var(, na.rm = T)`.
Caused by error in `cov()`:
! is.numeric(x) || is.logical(x) is not TRUE

What is causing this error?

Class activity

slr_slope <- function(df, x, y) {
  df |>
    summarize(slope = cov({{ x }}, {{ y }}, use="complete.obs")/
                var({{ x }}, na.rm=T))
}

list.files("intro_stats_grades", full.names=T) |>
  map(read_csv) |>
  map(function(df) slr_slope(df, midterm_1, midterm_2))

[[1]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.756

[[2]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.871

[[3]]
# A tibble: 1 × 1
  slope
  <dbl>
1  1.07

[[4]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.873

[[5]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.859

[[6]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.881

[[7]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.963

[[8]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.969

[[9]]
# A tibble: 1 × 1
  slope
  <dbl>
1  1.01

[[10]]
# A tibble: 1 × 1
  slope
  <dbl>
1 0.983

`purrr::map`

The function to be applied in map must take a single argument

# slr_slope takes THREE arguments:
list.files("intro_stats_grades", full.names=T) |>
  map(read_csv) |>
  map(slr_slope)

# the anonymous function takes only ONE argument:
list.files("intro_stats_grades", full.names=T) |>
  map(read_csv) |>
  map(function(df) slr_slope(df, midterm_1, midterm_2))

Another example

ex_list <- list(
  c(1, 2, 3),
  c(2, 3, 4)
)

map_dbl(ex_list, mean)

What do you think will be the output of this code?

Another example

ex_list <- list(
  c(1, 2, 3),
  c(2, 3, 4)
)

map_dbl(ex_list, mean)

[1] 2 3

ex_list[[1]]

[1] 1 2 3

mean(ex_list[[1]])

[1] 2

ex_list[[2]]

[1] 2 3 4

mean(ex_list[[2]])

[1] 3

Another example

ex_list <- list(
  c(1, 2, NA),
  c(2, 3, 4)
)

map_dbl(ex_list, mean)

What do you think will be the output of this code?

Another example

ex_list <- list(
  c(1, 2, NA),
  c(2, 3, 4)
)

map_dbl(ex_list, mean)

[1] NA  3

How do we ignore the NA when calculating the mean?

Another example

ex_list <- list(
  c(1, 2, NA),
  c(2, 3, 4)
)

map_dbl(ex_list, mean(na.rm=T))

Will this code work?

Another example

ex_list <- list(
  c(1, 2, NA),
  c(2, 3, 4)
)

map_dbl(ex_list, mean(na.rm=T))

Error in mean.default(na.rm = T): argument "x" is missing, with no default

Problem: mean(na.rm=T) is not a function! It is a call to the mean function.

Solution: use an anonymous function!

Another example

ex_list <- list(
  c(1, 2, NA),
  c(2, 3, 4)
)

map_dbl(ex_list, function(x) mean(x, na.rm=T))

[1] 1.5 3.0

Intro to Iteration

Class activity

Iteration motivation

purrr::map

purrr::map

purrr::map

purrr::map

purrr::map

Another example

Another example

map variants

Another example

Another example

Class activity

Class activity

purrr::map

Another example

Another example

Another example

Another example

Another example

Another example

Another example

`purrr::map`

`purrr::map`

`purrr::map`

`purrr::map`

`purrr::map`

`map` variants

`purrr::map`