Why will geom_tile build a subset of my data, but no more?

I'm trying to build a map, but I can't figure out why the following won't work:

Here is a minimal example

testdf <- structure(list(x = c(48.97, 44.22, 44.99, 48.87, 43.82, 43.16, 38.96, 38.49, 44.98, 43.9), y = c(-119.7, -113.7, -109.3, -120.6,  -109.6, -121.2, -114.2, -118.9, -109.7, -114.1), z = c(0.001216,  0.001631, 0.001801, 0.002081, 0.002158, 0.002265, 0.002298, 0.002334, 0.002349, 0.00249)), .Names = c("x", "y", "z"), row.names = c(NA, 10L), class = "data.frame")

      

This works for lines 1-8:

ggplot(data = testdf[1,], aes(x,y,fill = z)) + geom_tile()
ggplot(data = testdf[1:8,], aes(x,y,fill = z)) + geom_tile()

      

But not for 9 lines:

ggplot(data = testdf[1:9,], aes(x,y,fill = z)) + geom_tile()

      

Ultimately I am looking for a way to plot the data on an irregular grid. It doesn't matter that I use geom_tile, but any space interpolation over points will do.

The complete dataset is available as a gist

testdf

above was a small subset of the full dataset, high resolution US raster (> 7500 lines)

require(RCurl) # requires libcurl; sudo apt-get install libcurl4-openssl-dev
tmp <- getURL("https://gist.github.com/raw/4635980/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv")
testdf <- read.csv(textConnection(x))

      

What I have tried:

  • Using geom_point works, but doesn't have the desired effect:

    ggplot(data = testdf, aes(x,y,color=z)) + geom_point()
    
          

  • If I convert x or y to a 1:10 vector the graph works as expected:

    newdf <- transform(testdf, y =1:10)
    
    ggplot(data = newdf[1:9,], aes(x,y,fill = z)) + geom_tile()
    
    newdf <- transform(testdf, x =1:10)
    ggplot(data = newdf[1:9,], aes(x,y,fill = z)) + geom_tile()
    
          


sessionInfo()R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit)


> attached base packages: [1] stats     graphics  grDevices utils    
> datasets  methods   base     

> other attached packages: [1] reshape2_1.2.2 maps_2.3-0    
> betymaps_1.0   ggmap_2.2      ggplot2_0.9.3 

> loaded via a namespace (and not attached):  [1] colorspace_1.2-0   
> dichromat_1.2-4     digest_0.6.1        grid_2.15.2        
> gtable_0.1.2        labeling_0.1         [7] MASS_7.3-23        
> munsell_0.4         plyr_1.8            png_0.1-4          
> proto_0.3-10        RColorBrewer_1.0-5  [13] RgoogleMaps_1.2.0.2
> rjson_0.2.12        scales_0.2.3        stringr_0.6.2      
> tools_2.15.2

      

+3


source to share


4 answers


The reason you can't use geom_tile()

(or the more appropriate geom_raster()

one is that these two geoms

rely on your tiles being evenly spaced, which is not the case. You will need to coerce your data into points, and repeat it down to an evenly spaced raster. which you can then build with geom_raster()

. You will need to agree that you will need to modify your original data a little in order to build this however you wish.

You can also read raster:::projection

and rgdal:::spTransform

for more information on map projections.

require( RCurl )
require( raster )
require( sp )
require( ggplot2 )
tmp <- getURL("https://gist.github.com/geophtwombly/4635980/raw/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv")
testdf <- read.csv(textConnection(tmp))
spdf <- SpatialPointsDataFrame( data.frame( x = testdf$y , y = testdf$x ) , data = data.frame( z = testdf$z ) )

# Plotting the points reveals the unevenly spaced nature of the points
spplot(spdf)

      

enter image description here

# You can see the uneven nature of the data even better here via the moire pattern
plot(spdf)

      

enter image description here



# Make an evenly spaced raster, the same extent as original data
e <- extent( spdf )

# Determine ratio between x and y dimensions
ratio <- ( e@xmax - e@xmin ) / ( e@ymax - e@ymin )

# Create template raster to sample to
r <- raster( nrows = 56 , ncols = floor( 56 * ratio ) , ext = extent(spdf) )
rf <- rasterize( spdf , r , field = "z" , fun = mean )

# Attributes of our new raster (# cells quite close to original data)
rf
class       : RasterLayer 
dimensions  : 56, 135, 7560  (nrow, ncol, ncell)
resolution  : 0.424932, 0.4248191  (x, y)
extent      : -124.5008, -67.13498, 25.21298, 49.00285  (xmin, xmax, ymin, ymax)

# We can then plot this using `geom_tile()` or `geom_raster()`
rdf <- data.frame( rasterToPoints( rf ) )    
ggplot( NULL ) + geom_raster( data = rdf , aes( x , y , fill = layer ) )

      

enter image description here

# And as the OP asked for geom_tile, this would be...
ggplot( NULL ) + geom_tile( data = rdf , aes( x , y , fill = layer ) , colour = "white" )

      

enter image description here

Of course, I must add that this data is completely meaningless. What you really have to do is take the SpatialPointsDataFrame, assign the correct projection information to it, then transform to latlong coordinates via spTransform, and then rasterize the transformed points. In fact, you need to get more information about your raster data. What you have here is a close approximation, but ultimately not a true reflection of the data.

+9


source


This is not an answer to the problem geom_tile()

, but a different way of constructing the data.

Since you have the x and y coordinates of a 30km grid (I assume in the middle of that grid), you can use geom_point()

and display the data. You must select the appropriate value shape=

. Shape 15 will display rectangles.

Another problem is the x and y values ​​- when plotting the data, they must be plotted as x=y

well as y=x

to match latitude and longitude.



coord_equal()

will ensure the correct aspect ratio (I found this solution with aspect ratio as an example on the net).

ggplot(data = testdf, aes(y,x,colour=z)) + geom_point(shape=15)+
  coord_equal(ratio=1/cos(mean(testdf$x)*pi/180))

      

enter image description here

+9


source


Answer:

the data is being built but very small.


Hence:

"Tile plot as densely as possible, assuming that every tile is the same size.

      

Consider this plot

ggplot(data = testdf[1:2,], aes(x,y,fill = z)) + geom_tile()

      

enter image description here

There are two tiles in the above picture. geom_tile

tries to make the plot as dense as possible, given that each tile is the same size. Here we can make two tiles this big, no overlap. making enough space for 4 tiles.

Go to the following sites and see what the following stories tell you:

df1 <- data.frame(x=c(1:3),y=(1:3))
#     df1
#  x   y
#1 1   1
#2 2   2
#3 3   3
ggplot(data = df1[1,], aes(x,y)) + geom_tile()   
ggplot(data = df1[1:2,], aes(x,y)) + geom_tile() 
ggplot(data = df1[1:3,], aes(x,y)) + geom_tile()

      

compare this example:

 df2 <- data.frame(x=c(1:3),y=c(1,20,300))
 df2
 # x   y
#1 1   1
#2 2  20
#3 3 300

 ggplot(data = df2[1,], aes(x,y)) + geom_tile()
 ggplot(data = df2[1:2,], aes(x,y)) + geom_tile()
 ggplot(data = df2[1:3,], aes(x,y)) + geom_tile()

      

Note that the first two graphs are the same for df1

and df2

, but the third graph for df2

is different. This is because the largest number of tiles we can make between ( x[1],y[1])

and ( x[2],y[2])

). Moreover, they will overlap, which leaves a lot of space between these two tiles and the last 3rd tile in the y=300

.

There geom_tile

is a parameter in width

, although I'm not sure how reasonable it is here. are you sure you don't like the other option with such sparse data?

(Your complete data is still built: see. ggplot(data = testdf, aes(x,y)) + geom_tile(width=1000)

+4


source


If you want to use geom_tile I think you will need to aggregate first:

# NOTE: tmp.csv downloaded from https://gist.github.com/geophtwombly/4635980/raw/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv
testdf <- read.csv("~/Desktop/tmp.csv") 

# combine x,y coordinates by rounding
testdf$x2 <- round(testdf$x, digits=0)
testdf$y2 <- round(testdf$y, digits=0)

# aggregate on combined coordinates
library(plyr)
testdf <- ddply(testdf, c("x2", "y2"), summarize,
                z = mean(z))

# plot aggregated data using geom_tile
ggplot(data = testdf, aes(y2,x2,fill=z)) +
  geom_tile() +
  coord_equal(ratio=1/cos(mean(testdf$x2)*pi/180)) # copied from @Didzis Elferts answer--nice!

      

Once we do that, we will probably conclude that geom_point () is better, as suggested by @Didzis Elferts.

+1


source







All Articles