var(1:10)[1] 9.166667
In statistics, we often summarize the variability or spread of a numeric variable by calculating the sample variance. Given \(n\) observations \(x_1,...,x_n\), the sample variance \(s^2\) is defined by
\[s^2 = \frac{1}{n-1} \sum \limits_{i=1}^n (x_i - \overline{x})^2\]
In R, this can be done with the var function. For example:
var(1:10)[1] 9.166667
my_var. You may use standard arithmetic operations in R, but do not use any existing implementations of the sample variance or standard deviation (e.g., don’t use var or sd when writing your function).Solution:
my_var <- function(x){
sum((x - mean(x))^2)/(length(x) - 1)
}
my_var(1:10)[1] 9.166667
my_var function, compute the variance for the all the numeric columns in the diamonds data.library(tidyverse)
diamonds |>
summarize(across(where(is.numeric),
my_var))# A tibble: 1 × 7
carat depth table price x y z
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.225 2.05 4.99 15915629. 1.26 1.30 0.498