On the first day of class, we discussed an experiment which investigated whether interacting with dogs can help exam stress for college students. After cleaning the raw measurements, the experiment data looks like this:
Each row of the cleaned data represents one student, with columns for student ID (RID), their experimental group assignment (Control means handler-only contact, Indirect means a dog was present, Direct means direct contact with a dog), and their pre- and post-intervention scores for different wellbeing and illbeing measurements.
For example, happiness_pre is the student’s assessed happiness before the intervention, and happiness_post is the student’s assessed happiness after the intervention.
The data can be imported into R with the following code:
Reshape the cleaned_data in R so it looks like this (note that I am only displaying the first few rows):
RID GroupAssignment measurement stage score
1 1 Control pa pre 3.200000
2 1 Control pa post 3.800000
3 1 Control happiness pre 2.333333
4 1 Control happiness post 3.333333
5 1 Control sc pre 3.900000
6 1 Control sc post 3.800000
Using your reshaped data from question 2, fit a linear model with happiness as the response variable and GroupAssignment and stage as the explanatory variables. Would you have been able to fit this model without pivoting the data?
Call:
lm(formula = happiness ~ GroupAssignment + stage, data = cleaned_data)
Residuals:
Min 1Q Median 3Q Max
-2.37622 -0.59949 0.06745 0.51084 1.73384
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.48916 0.07151 48.793 < 2e-16 ***
GroupAssignmentDirect 0.11006 0.08743 1.259 0.20860
GroupAssignmentIndirect 0.02409 0.08743 0.276 0.78296
stagepre -0.22300 0.07132 -3.127 0.00186 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8499 on 564 degrees of freedom
Multiple R-squared: 0.02004, Adjusted R-squared: 0.01483
F-statistic: 3.844 on 3 and 564 DF, p-value: 0.009631
We could not have fit this model without pivoting the data. Before pivoting, stage was not a variable in the data (rather, it was part of the column names), and so we could not have included stage as a variable in the regression model.