Work on the activity (handout) with a neighbor, then we will discuss as a class
What is this code trying to do?
grouped_max <- function(df, group_var, max_var) {
df |>
group_by(group_var) |>
summarize(max(max_var, na.rm=T))
}
grouped_max(penguins, species, bill_depth_mm)Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `group_var` is not found.
What is causing the error?
grouped_max <- function(df, group_var, max_var) {
df |>
group_by(group_var) |>
summarize(max(max_var, na.rm=T))
}
grouped_max(penguins, species, bill_depth_mm)Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `group_var` is not found.
What should we change so the code runs correctly?
This code contains two different types of variables:
penguins is an env-variable (environment variable)species is a data-variable (it makes sense only within the context of a data frame)Env-variables are objects in the R environment that we can interact with directly. For example:
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <fct>, year <int>
Data-variables only exist in the context of a data frame:
[1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[8] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[15] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[22] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[29] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[36] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[43] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[50] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[57] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[64] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[71] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[78] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[85] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[92] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[99] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[106] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[113] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[120] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[127] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[134] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[141] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[148] Adelie Adelie Adelie Adelie Adelie Gentoo Gentoo
[155] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[162] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[169] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[176] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[183] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[190] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[197] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[204] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[211] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[218] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[225] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[232] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[239] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[246] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[253] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[260] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[267] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
[274] Gentoo Gentoo Gentoo Chinstrap Chinstrap Chinstrap Chinstrap
[281] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[288] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[295] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[302] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[309] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[316] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[323] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[330] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[337] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[344] Chinstrap
Levels: Adelie Chinstrap Gentoo
Many tidyverse functions are nice and allow us to reference data-variables:
Here filter knows to look for a column called species in the penguins data.
Of course, you will get an error if you try to reference a data-variable that doesn’t exist! E.g. if we mis-spell the name:
Of course, you will get an error if you try to reference a data-variable that doesn’t exist!
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `group_var` is not found.
The problem: group_var and max_var are not columns in the penguins data!
grouped_max <- function(df, group_var, max_var) {
df |>
group_by(group_var) |>
summarize(max(max_var, na.rm=T))
}
grouped_max(penguins, species, bill_depth_mm)Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `group_var` is not found.
What we want R to run:
grouped_max <- function(df, group_var, max_var) {
df |>
group_by(group_var) |>
summarize(max(max_var, na.rm=T))
}
grouped_max(penguins, species, bill_depth_mm)Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `group_var` is not found.
What R is actually running:
grouped_max <- function(df, group_var, max_var) {
df |>
group_by({{ group_var }}) |>
summarize(max({{ max_var }}, na.rm=T))
}
grouped_max(penguins, species, bill_depth_mm)# A tibble: 3 × 2
species `max(bill_depth_mm, na.rm = T)`
<fct> <dbl>
1 Adelie 21.5
2 Chinstrap 20.8
3 Gentoo 17.3
What R is running now:
Suppose we want to fit a simple linear regression model:
(Intercept) bill_depth_mm
55.0673698 -0.6498356
Do you think this code will work?
(Intercept) bill_depth_mm
55.0673698 -0.6498356
lm_coef <- function(df, x, y) {
df |>
lm({{ y }} ~ {{ x }}, data = _) |>
coef()
}
lm_coef(penguins, bill_depth_mm, bill_length_mm)Error: object 'bill_length_mm' not found
Why does this code fail?
(Intercept) bill_depth_mm
55.0673698 -0.6498356
lm_coef <- function(df, x, y) {
df |>
lm({{ y }} ~ {{ x }}, data = _) |>
coef()
}
lm_coef(penguins, bill_depth_mm, bill_length_mm)Error: object 'bill_length_mm' not found
Problem: The lm function does not support tidy evaluation! (To see if a function does support tidy evaluation, look for keywords like “data masking” or “tidy selection” in the documentation.)
(Intercept) bill_depth_mm
55.0673698 -0.6498356
lm_coef <- function(df, x, y) {
df |>
lm({{ y }} ~ {{ x }}, data = _) |>
coef()
}
lm_coef(penguins, bill_depth_mm, bill_length_mm)Error: object 'bill_length_mm' not found
If lm doesn’t support tidy evaluation, what could we do differently?
SLR slope: \(\widehat{\beta}_1 = \frac{\sum \limits_{i=1}^n (x_i - \overline{x})(y_i - \overline{y})}{\sum \limits_{i=1}^n (x_i - \overline{x})^2}\)
(Intercept) bill_depth_mm
55.0673698 -0.6498356
[1] -0.6498356
How would I turn this into a function?
https://sta279-f25.github.io/class_activities/ca_10.html
For next time, read: