R: Using dplyr summarise_all
R: dplyr summarise_all
Peter Lin
2018/4/6
Using summarise_all in dplyr package
summarise vs summarise_all
first: let’s random look at the data
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
iris[sample(1:nrow(iris),10),]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 50 5.0 3.3 1.4 0.2 setosa
## 138 6.4 3.1 5.5 1.8 virginica
## 83 5.8 2.7 3.9 1.2 versicolor
## 23 4.6 3.6 1.0 0.2 setosa
## 54 5.5 2.3 4.0 1.3 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 24 5.1 3.3 1.7 0.5 setosa
## 141 6.7 3.1 5.6 2.4 virginica
## 20 5.1 3.8 1.5 0.3 setosa
## 30 4.7 3.2 1.6 0.2 setosa
second: let’s using summarise
iris %>% group_by(Species) %>% summarise(SL.mean=mean(Sepal.Length))
## # A tibble: 3 x 2
## Species SL.mean
## <fct> <dbl>
## 1 setosa 5.01
## 2 versicolor 5.94
## 3 virginica 6.59
# if we calculate all ...
iris %>% group_by(Species) %>% summarise(SL.mean=mean(Sepal.Length),SW.mean=mean(Sepal.Width),PL.mean=mean(Petal.Length),PW.mean=mean(Petal.Width))
## # A tibble: 3 x 5
## Species SL.mean SW.mean PL.mean PW.mean
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 setosa 5.01 3.43 1.46 0.246
## 2 versicolor 5.94 2.77 4.26 1.33
## 3 virginica 6.59 2.97 5.55 2.03
finally: let’s using summarise_all
iris %>% group_by(Species) %>% summarise_all(mean)
## # A tibble: 3 x 5
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 setosa 5.01 3.43 1.46 0.246
## 2 versicolor 5.94 2.77 4.26 1.33
## 3 virginica 6.59 2.97 5.55 2.03