Why will geom_tile build a subset of my data, but no more?
I'm trying to build a map, but I can't figure out why the following won't work:
Here is a minimal example
testdf <- structure(list(x = c(48.97, 44.22, 44.99, 48.87, 43.82, 43.16, 38.96, 38.49, 44.98, 43.9), y = c(-119.7, -113.7, -109.3, -120.6, -109.6, -121.2, -114.2, -118.9, -109.7, -114.1), z = c(0.001216, 0.001631, 0.001801, 0.002081, 0.002158, 0.002265, 0.002298, 0.002334, 0.002349, 0.00249)), .Names = c("x", "y", "z"), row.names = c(NA, 10L), class = "data.frame")
This works for lines 1-8:
ggplot(data = testdf[1,], aes(x,y,fill = z)) + geom_tile() ggplot(data = testdf[1:8,], aes(x,y,fill = z)) + geom_tile()
But not for 9 lines:
ggplot(data = testdf[1:9,], aes(x,y,fill = z)) + geom_tile()
Ultimately I am looking for a way to plot the data on an irregular grid. It doesn't matter that I use geom_tile, but any space interpolation over points will do.
The complete dataset is available as a gist
testdf
above was a small subset of the full dataset, high resolution US raster (> 7500 lines)
require(RCurl) # requires libcurl; sudo apt-get install libcurl4-openssl-dev
tmp <- getURL("https://gist.github.com/raw/4635980/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv")
testdf <- read.csv(textConnection(x))
What I have tried:
-
Using geom_point works, but doesn't have the desired effect:
ggplot(data = testdf, aes(x,y,color=z)) + geom_point()
-
If I convert x or y to a 1:10 vector the graph works as expected:
newdf <- transform(testdf, y =1:10) ggplot(data = newdf[1:9,], aes(x,y,fill = z)) + geom_tile() newdf <- transform(testdf, x =1:10) ggplot(data = newdf[1:9,], aes(x,y,fill = z)) + geom_tile()
sessionInfo()R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit)
> attached base packages: [1] stats graphics grDevices utils
> datasets methods base
> other attached packages: [1] reshape2_1.2.2 maps_2.3-0
> betymaps_1.0 ggmap_2.2 ggplot2_0.9.3
> loaded via a namespace (and not attached): [1] colorspace_1.2-0
> dichromat_1.2-4 digest_0.6.1 grid_2.15.2
> gtable_0.1.2 labeling_0.1 [7] MASS_7.3-23
> munsell_0.4 plyr_1.8 png_0.1-4
> proto_0.3-10 RColorBrewer_1.0-5 [13] RgoogleMaps_1.2.0.2
> rjson_0.2.12 scales_0.2.3 stringr_0.6.2
> tools_2.15.2
source to share
The reason you can't use geom_tile()
(or the more appropriate geom_raster()
one is that these two geoms
rely on your tiles being evenly spaced, which is not the case. You will need to coerce your data into points, and repeat it down to an evenly spaced raster. which you can then build with geom_raster()
. You will need to agree that you will need to modify your original data a little in order to build this however you wish.
You can also read raster:::projection
and rgdal:::spTransform
for more information on map projections.
require( RCurl )
require( raster )
require( sp )
require( ggplot2 )
tmp <- getURL("https://gist.github.com/geophtwombly/4635980/raw/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv")
testdf <- read.csv(textConnection(tmp))
spdf <- SpatialPointsDataFrame( data.frame( x = testdf$y , y = testdf$x ) , data = data.frame( z = testdf$z ) )
# Plotting the points reveals the unevenly spaced nature of the points
spplot(spdf)
# You can see the uneven nature of the data even better here via the moire pattern
plot(spdf)
# Make an evenly spaced raster, the same extent as original data
e <- extent( spdf )
# Determine ratio between x and y dimensions
ratio <- ( e@xmax - e@xmin ) / ( e@ymax - e@ymin )
# Create template raster to sample to
r <- raster( nrows = 56 , ncols = floor( 56 * ratio ) , ext = extent(spdf) )
rf <- rasterize( spdf , r , field = "z" , fun = mean )
# Attributes of our new raster (# cells quite close to original data)
rf
class : RasterLayer
dimensions : 56, 135, 7560 (nrow, ncol, ncell)
resolution : 0.424932, 0.4248191 (x, y)
extent : -124.5008, -67.13498, 25.21298, 49.00285 (xmin, xmax, ymin, ymax)
# We can then plot this using `geom_tile()` or `geom_raster()`
rdf <- data.frame( rasterToPoints( rf ) )
ggplot( NULL ) + geom_raster( data = rdf , aes( x , y , fill = layer ) )
# And as the OP asked for geom_tile, this would be...
ggplot( NULL ) + geom_tile( data = rdf , aes( x , y , fill = layer ) , colour = "white" )
Of course, I must add that this data is completely meaningless. What you really have to do is take the SpatialPointsDataFrame, assign the correct projection information to it, then transform to latlong coordinates via spTransform, and then rasterize the transformed points. In fact, you need to get more information about your raster data. What you have here is a close approximation, but ultimately not a true reflection of the data.
source to share
This is not an answer to the problem geom_tile()
, but a different way of constructing the data.
Since you have the x and y coordinates of a 30km grid (I assume in the middle of that grid), you can use geom_point()
and display the data. You must select the appropriate value shape=
. Shape 15 will display rectangles.
Another problem is the x and y values - when plotting the data, they must be plotted as x=y
well as y=x
to match latitude and longitude.
coord_equal()
will ensure the correct aspect ratio (I found this solution with aspect ratio as an example on the net).
ggplot(data = testdf, aes(y,x,colour=z)) + geom_point(shape=15)+
coord_equal(ratio=1/cos(mean(testdf$x)*pi/180))
source to share
Answer:
the data is being built but very small.
"Tile plot as densely as possible, assuming that every tile is the same size.
Consider this plot
ggplot(data = testdf[1:2,], aes(x,y,fill = z)) + geom_tile()
There are two tiles in the above picture. geom_tile
tries to make the plot as dense as possible, given that each tile is the same size. Here we can make two tiles this big, no overlap. making enough space for 4 tiles.
Go to the following sites and see what the following stories tell you:
df1 <- data.frame(x=c(1:3),y=(1:3))
# df1
# x y
#1 1 1
#2 2 2
#3 3 3
ggplot(data = df1[1,], aes(x,y)) + geom_tile()
ggplot(data = df1[1:2,], aes(x,y)) + geom_tile()
ggplot(data = df1[1:3,], aes(x,y)) + geom_tile()
compare this example:
df2 <- data.frame(x=c(1:3),y=c(1,20,300))
df2
# x y
#1 1 1
#2 2 20
#3 3 300
ggplot(data = df2[1,], aes(x,y)) + geom_tile()
ggplot(data = df2[1:2,], aes(x,y)) + geom_tile()
ggplot(data = df2[1:3,], aes(x,y)) + geom_tile()
Note that the first two graphs are the same for df1
and df2
, but the third graph for df2
is different. This is because the largest number of tiles we can make between ( x[1],y[1])
and ( x[2],y[2])
). Moreover, they will overlap, which leaves a lot of space between these two tiles and the last 3rd tile in the y=300
.
There geom_tile
is a parameter in width
, although I'm not sure how reasonable it is here. are you sure you don't like the other option with such sparse data?
(Your complete data is still built: see. ggplot(data = testdf, aes(x,y)) + geom_tile(width=1000)
source to share
If you want to use geom_tile I think you will need to aggregate first:
# NOTE: tmp.csv downloaded from https://gist.github.com/geophtwombly/4635980/raw/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv
testdf <- read.csv("~/Desktop/tmp.csv")
# combine x,y coordinates by rounding
testdf$x2 <- round(testdf$x, digits=0)
testdf$y2 <- round(testdf$y, digits=0)
# aggregate on combined coordinates
library(plyr)
testdf <- ddply(testdf, c("x2", "y2"), summarize,
z = mean(z))
# plot aggregated data using geom_tile
ggplot(data = testdf, aes(y2,x2,fill=z)) +
geom_tile() +
coord_equal(ratio=1/cos(mean(testdf$x2)*pi/180)) # copied from @Didzis Elferts answer--nice!
Once we do that, we will probably conclude that geom_point () is better, as suggested by @Didzis Elferts.
source to share