# A tibble: 1 × 3
median_height median_mass median_birth_year
<int> <dbl> <dbl>
1 NA NA NA
Work on the activity (handout) with a neighbor, then we will discuss as a class
Your friend writes the following code:
# A tibble: 1 × 3
median_height median_mass median_birth_year
<int> <dbl> <dbl>
1 NA NA NA
Why are they getting NAs?
What would I change to ignore missing values (NAs) when computing the median?
# A tibble: 1 × 1
median_height
<int>
1 180
Now let’s try with across…
Error in `summarize()`:
ℹ In argument: `across(where(is.numeric), list(median = median(na.rm =
T)))`.
Caused by error in `median.default()`:
! argument "x" is missing, with no default
Why is this code failing?
median is a function:
function (x, na.rm = FALSE, ...)
UseMethod("median")
<bytecode: 0x13cf29ab8>
<environment: namespace:stats>
median() is evaluating (calling) the function:
Error in `summarize()`:
ℹ In argument: `across(where(is.numeric), list(median = median(na.rm =
T)))`.
Caused by error in `median.default()`:
! argument "x" is missing, with no default
What should we do instead?
We want a function that calculates the median without the NAs, so we can do something like
However, this median_no_na function doesn’t exist. We have to write it ourselves!
What will each of the following lines return?
median_no_na <- function(x) {
median(x, na.rm = T)
}
starwars |>
summarize(across(where(is.numeric),
list("median" = median_no_na)))# A tibble: 1 × 3
height_median mass_median birth_year_median
<int> <dbl> <dbl>
1 180 79 52
What would I change if I want to calculate the mean instead of the median?
mean_no_na <- function(x) {
mean(x, na.rm = T)
}
starwars |>
summarize(across(where(is.numeric),
list("mean" = mean_no_na)))# A tibble: 1 × 3
height_mean mass_mean birth_year_mean
<dbl> <dbl> <dbl>
1 175. 97.3 87.6
Will we need to use the mean_no_na or median_no_na functions many times?
If we don’t need a function repeatedly, we can make an anonymous function instead:
Anonymous function:
https://sta279-f25.github.io/class_activities/ca_08.html