Generating a Prediction Mesh from dplyr Piping

I hope someone has a solution for using some form expand.grid

in pipelines using dplyr. I am doing some modeling where I have several different groups (or Types below) and the groups have different ranges for the x and y data. When I run a gamma on the data, I'm interested in creating a graph for predictions, but I only want to predict the values ​​in the range that each value occupies, not the entire range of the dataset.

I already have a working example posted below, but I'm wondering if there is a way to get around the loop and get my task done.

Greetings

require(ggplot2)
require(dplyr)

# Create some data
df  = data.frame(Type = rep(c("A","B"), each = 100),
                 x = c(rnorm(100, 0, 1), rnorm(100, 2, 1)),
                 y = c(rnorm(100, 0, 1), rnorm(100, 2, 1)))

# and if you want to check out the data
ggplot(df,aes(x,y,col=Type)) + geom_point() + stat_ellipse()

# OK so I have no issue extracting the minimum and maximum values 
# for each type
df_summ = df %>%
  group_by(Type) %>%
  summarize(xmin = min(x),
            xmax = max(x),
            ymin = min(y),
            ymax = max(y))
df_summ

# and I can create a loop and use the expand.grid function to get my 
# desired output
test = NULL
for(ii in c("A","B")){
  df1 = df_summ[df_summ$Type == ii,]
  x = seq(df1$xmin, df1$xmax, length.out = 10)
  y = seq(df1$ymin, df1$ymax, length.out = 10)
  coords = expand.grid(x = x, y = y)
  coords$Type = ii
  test = rbind(test, coords)
}

ggplot(test, aes(x,y,col = Type)) + geom_point()

      

But what I really would like to do is find a way to get around the loop and try to get the same result directly from my pipeline operator. I have tried several combinations using the do () function but with no effect, and the one presented below is just one of many failed attempts.

df %>%
  group_by(Type) %>%
  summarize(xmin = min(x),
            xmax = max(x),
            ymin = min(y),
            ymax = max(y)) %>%
  do(data.frame(x = seq(xmin, xmax, length.out = 10),
                y = seq(ymin, ymax, length.out = 10)))

# this last line returns an error
# Error in is.finite(from) : 
#   default method not implemented for type 'closure'

      

+3


source to share


2 answers


Your attempt do()

was almost right. The trick is to just regroup after the summation (which seems to reduce the grouping). Also, you need to make sure that you are taking values ​​from the data in the chain with .$

. try it

test <- df %>%
  group_by(Type) %>%
  summarize(xmin = min(x),
            xmax = max(x),
            ymin = min(y),
            ymax = max(y)) %>%
  group_by(Type) %>%
  do(expand.grid(x = seq(.$xmin, .$xmax, length.out = 10),
                y = seq(.$ymin, .$ymax, length.out = 10)))
ggplot(test, aes(x,y,col = Type)) + geom_point()

      



enter image description here

+2


source


Using a function data_grid

from a package modelr

, here's one way to do it:

library(dplyr)
library(modelr)

df %>%
   group_by(Type) %>%
   data_grid(x, y) %>%
ggplot(aes(x,y, color = Type)) + geom_point()

      



enter image description here

This approach generates for each value x

, and each value y

in each group contains a string containing a pair of x

and y

. Thus, each pair x

- y

in the resulting framework is based only on the values x

and y

that actually appear in the data.

0


source







All Articles