Seems like this should be easy but I'm stumped. I've gotten the rough hang of programming with dplyr 0.7, but struggling with this: How do I program in dplyr if the variable I want to program with will be a string?

I am scraping a database, and for a variety of reasons want to summarize a variable that I will know the position of but not the name of (the thing I want is always the first column of the supplied table, but the name of the variable stored in that column will vary depending on the database being scraped). To use iris as an example, suppose that I know that the variable that I want is in the first column

library(tidyverse)
desired_var <- colnames(iris)[1]
print(desired_var)
"Sepal.Length"

I now want to group by Species, and take the mean of desired_var, i.e. what I want is to perform

iris %>% 
group_by(Species) %>% 
summarise(desired_mean = mean(Sepal.Length))

But, now I want to take the mean of a column which is defined by a string stored in desired_var

I get how to do this with a "bare" Sepal.Length

desired_var <- quo(Sepal.Length)

iris %>% 
group_by(Species) %>% 
summarise(desired_mean = mean(!!desired_var))

But how in the world do I deal with the fact that I have "Sepal.Length" not Sepal.Length , i.e. that desired_var <- "Sepal.Length" ?

2 Answers 2

1) dynamic variable with !!parse_expr Use parse_expr like this:

library(dplyr)
library(rlang)

desired_var <- "Sepal.Length"

iris %>% 
  group_by(Species) %>% 
  summarise(desired_mean = mean(!!parse_expr(desired_var))) %>%
  ungroup

giving:

# A tibble: 3 x 2
     Species desired_mean
      <fctr>        <dbl>
1     setosa        5.006
2 versicolor        5.936
3  virginica        6.588

2) summarise_at As @Phil points out in the comments in the particular case of summarise this could be done like this:

library(dplyr)

desired_var <- "Sepal.Length"

iris %>% 
   group_by(Species) %>% 
   summarise_at(desired_var, funs(mean)) %>%
   ungroup

giving:

# A tibble: 3 x 2
     Species Sepal.Length
      <fctr>        <dbl>
1     setosa        5.006
2 versicolor        5.936
3  virginica        6.588

3) dynamic variable and name with !! If you need to set the name dynamically in (1) then try this:

library(dplyr)
library(rlang)

desired_var <- "Sepal.Length"

desired_var_name <- paste("mean", desired_var, sep = "_")

iris %>% 
  group_by(Species) %>% 
  summarise(!!desired_var_name := mean(!!parse_expr(desired_var))) %>%
  ungroup

giving:

# A tibble: 3 x 2
     Species mean_Sepal.Length
      <fctr>             <dbl>
1     setosa             5.006
2 versicolor             5.936
3  virginica             6.588
    
Thanks! !!!parse_expr does the trick for a variety of other situations that I need this (e.g. grouping by a string as well) – DanO 4 hours ago
    
I'm having trouble understanding what advantage this has over summarize_at. It seems like a more complex way to achieve the same result. – Phil 4 hours ago
    
@Rich Scriven, Be sure you are using the latest version of each package. This represents recently added functionality. – G. Grothendieck 4 hours ago
    
@Phil, I have added your suggestion as (2) – G. Grothendieck 4 hours ago
    
@Rich_Scriven, Either !! or !!! should work. !!! is used for splicing multiple arguments but here we have just one so !! is sufficient. I have simplified answer to use just !! although !!! was not wrong and it does work. – G. Grothendieck 4 hours ago

You're wondering into tidyeval which is a rather new feature of the tidyverse (see here) more used to create functions using tidyverse functions. For now it is only available with dplyr but the plan is to extend it to the other tidyverse packages.

For your need though, you don't really need to get into that, when summarize_at will do. This function allows you to extend a particular manipulation that you specify across any variables of your choosing:

iris %>% 
  group_by(Species) %>% 
  summarise_at(vars(one_of("Sepal.Length", "Sepal.Width")), funs(desired_mean = mean))

# A tibble: 3 x 3
     Species Sepal.Length_desired_mean Sepal.Width_desired_mean
      <fctr>                     <dbl>                    <dbl>
1     setosa                     5.006                    3.428
2 versicolor                     5.936                    2.770
3  virginica                     6.588                    2.974

You can store the list of variables into a vector, and then use that vector instead:

selected_vectors <- c("Sepal.Length", "Sepal.Width")
iris %>% 
  group_by(Species) %>% 
  summarise_at(vars(one_of(selected_vectors)), funs(desired_mean = mean))
    
That does it, thanks! – DanO 4 hours ago