Create grid cells and colors with scatter plot averages using ggplot2
Given a numerical dataset {(x_i, y_i, z_i)} with N points, it is possible to create a scatter plot by drawing a point P_i = (x_i, y_i) for each i = 1, ..., N and the color of each point with an intensity dependent from the value z_i.
library(ggplot2)
N = 1000;
dfA = data.frame(runif(N), runif(N), runif(N))
dfB = data.frame(runif(N), runif(N), runif(N))
names(dfA) = c("x", "y", "z")
names(dfB) = c("x", "y", "z")
PlotA <- ggplot(data = dfA, aes(x = x, y = y)) + geom_point(aes(colour = z));
PlotB <- ggplot(data = dfB, aes(x = x, y = y)) + geom_point(aes(colour = z));
Let's assume I have created these scatterplots. What I would like to do for each dataset is to divide the plane into a grid (rectangular, hexagonal, triangular, ... never mind) and color each grid cell with an average intensity of all points that fall inside the cell.
Also, suppose I have created two such plots, PlotA and PlotB (as above) for two different datasets dfA and dfB. Let c_i ^ k be the i-th cell of the graph k. I want to create a third graph such that c_i ^ 3 = c_i ^ 1 * c_i ^ 2 for each i.
Thank.
EDIT: minimal example
source to share
Splitting a plane and calculating summaries for rectangles is pretty straight forward with a function stat_summary2d
. First, I'm going to create explicit breaks, rather than letting ggplot
them be selected so that they are exactly the same for both graphs
bb<-seq(0,1,length.out=10+1)
breaks<-list(x=bb, y=bb)
p1 <- ggplot(data = dfA, aes(x = x, y = y, z=z)) +
stat_summary2d(fun=mean, breaks=breaks) + ggtitle("A");
p2 <- ggplot(data = dfB, aes(x = x, y = y, z=z)) +
stat_summary2d(fun=mean, breaks=breaks) + ggtitle("B");
Then, to get the different ones, it's a little messy, but we can extract the data from the plots we've already created and combine them
#get data
d1 <- ggplot_build(p1)$data[[1]][, 2:4]
d2 <- ggplot_build(p2)$data[[1]][, 2:4]
mm <- merge(d1, d2, by=c("xbin","ybin"))
#turn factor back into numeric values
mids <- diff(bb)/2+bb[-length(bb)]
#plot difference
ggplot(mm, aes(x=mids[xbin], y=mids[ybin], fill=value.x-value.y)) +
geom_tile() + scale_fill_gradient2(name="diff") + labs(x="x",y="y")
source to share