Activity: Practice pivoting with the dog data

Dog data

On the first day of class, we discussed an experiment which investigated whether interacting with dogs can help exam stress for college students. After cleaning the raw measurements, the experiment data looks like this:

head(cleaned_data)
# A tibble: 6 × 18
    RID GroupAssignment pa_pre pa_post happiness_pre happiness_post sc_pre
  <dbl> <chr>            <dbl>   <dbl>         <dbl>          <dbl>  <dbl>
1     1 Control            3.2     3.8          2.33           3.33   3.9 
2     2 Direct             3       3.2          3.33           4      5.15
3     3 Indirect           2.8     3            2.67           3.33   4.1 
4     4 Control            4.2     3.8          3              3      4.65
5     5 Direct             3.4     4            2.67           2.67   3.65
6     6 Indirect           4.2     4.4          3              3.33   4.35
# ℹ 11 more variables: sc_post <dbl>, fs_pre <dbl>, fs_post <dbl>,
#   stress_pre <dbl>, stress_post <dbl>, homesick_pre <dbl>,
#   homesick_post <dbl>, lonely_pre <dbl>, lonely_post <dbl>, na_pre <dbl>,
#   na_post <dbl>

Each row of the cleaned data represents one student, with columns for student ID (RID), their experimental group assignment (Control means handler-only contact, Indirect means a dog was present, Direct means direct contact with a dog), and their pre- and post-intervention scores for different wellbeing and illbeing measurements.

For example, happiness_pre is the student’s assessed happiness before the intervention, and happiness_post is the student’s assessed happiness after the intervention.

The data can be imported into R with the following code:

library(readr)

cleaned_data <- read_csv("https://sta279-f25.github.io/class_activities/cleaned_dog_data.csv")

Pivoting

  1. Reshape the cleaned_data in R so it looks like this (note that I am only displaying the first few rows):
  RID GroupAssignment measurement stage    score
1   1         Control          pa   pre 3.200000
2   1         Control          pa  post 3.800000
3   1         Control   happiness   pre 2.333333
4   1         Control   happiness  post 3.333333
5   1         Control          sc   pre 3.900000
6   1         Control          sc  post 3.800000

Solution:

cleaned_data |>
  pivot_longer(cols = -c(RID, GroupAssignment),
               names_to = c("measurement", "stage"),
               names_sep = "_",
               values_to = "score")

Reshape the cleaned_data in R so it looks like this (note that I am only displaying the first few rows):

  RID GroupAssignment stage  pa happiness       sc    fs stress homesick lonely
1   1         Control   pre 3.2  2.333333 3.900000 6.125      2        3   2.25
2   1         Control  post 3.8  3.333333 3.800000 6.000      2        3   1.70
3   2          Direct   pre 3.0  3.333333 5.150000 5.250      2        4   1.90
4   2          Direct  post 3.2  4.000000 5.263158 6.000      1        2   1.60
5   3        Indirect   pre 2.8  2.666667 4.100000 5.375      4        3   2.25
6   3        Indirect  post 3.0  3.333333 4.150000 5.375      3        2   2.25
   na
1 1.0
2 1.2
3 1.8
4 1.0
5 1.6
6 1.6

Solution:

cleaned_data |>
  pivot_longer(cols = -c(RID, GroupAssignment),
               names_to = c(".value", "stage"),
               names_sep = "_")
  1. Using your reshaped data from question 2, fit a linear model with happiness as the response variable and GroupAssignment and stage as the explanatory variables. Would you have been able to fit this model without pivoting the data?

Solution:

cleaned_data <- cleaned_data |>
  pivot_longer(cols = -c(RID, GroupAssignment),
               names_to = c(".value", "stage"),
               names_sep = "_")

m1 <- lm(happiness ~ GroupAssignment + stage, data = cleaned_data)
summary(m1)

Call:
lm(formula = happiness ~ GroupAssignment + stage, data = cleaned_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.37622 -0.59949  0.06745  0.51084  1.73384 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)              3.48916    0.07151  48.793  < 2e-16 ***
GroupAssignmentDirect    0.11006    0.08743   1.259  0.20860    
GroupAssignmentIndirect  0.02409    0.08743   0.276  0.78296    
stagepre                -0.22300    0.07132  -3.127  0.00186 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8499 on 564 degrees of freedom
Multiple R-squared:  0.02004,   Adjusted R-squared:  0.01483 
F-statistic: 3.844 on 3 and 564 DF,  p-value: 0.009631

We could not have fit this model without pivoting the data. Before pivoting, stage was not a variable in the data (rather, it was part of the column names), and so we could not have included stage as a variable in the regression model.