Loops in R, aggregating data based on different variables

Question

Loops in R, aggregating data based on different variables

I have a data frame with 2332 lines, I want to find the lines where the variable "MAIL" are equal, then assign them all the values in the line where the variable "area" is the largest.

here are the first 50 lines

> data[1:50,]
   POSTAL        x       y         area
0   12920 573385.9 4972933 8.384062e+06
1   12921 623487.7 4971908 8.233541e+07
2   12923 583786.9 4978081 1.474410e+08
3   12924 613452.4 4927788 1.497106e+07
4   12934 588962.9 4965368 2.194386e+08
5   12935 596550.0 4967100 1.888997e+08
6   12944 618378.6 4921592 2.534854e+07
7   12952 583074.3 4953381 2.943473e+07
8   12955 582523.7 4959810 5.204965e+07
9   12958 611949.9 4979674 9.186815e+07
10  12959 601546.4 4979545 1.037816e+08
11  12962 611088.7 4951280 1.079834e+08
12  12972 612442.2 4934335 2.356099e+08
13  12978 595047.1 4941416 9.280316e+06
14  12979 628230.8 4983172 1.076677e+07
15  12981 591559.5 4944906 3.203060e+08
16  12985 599050.4 4935220 1.643595e+08
17  12992 616585.6 4963995 1.989913e+08
18  12997 592669.1 4914134 2.731502e+07
19  12017 627445.1 4686235 4.773138e+07
20  12024 619994.9 4704246 7.021505e+06
21  12029 629805.8 4696477 5.399608e+07
22  12037 618566.6 4688290 9.184531e+07
23  12060 624089.4 4697165 8.745604e+07
24  12062 622755.7 4709897 8.574364e+06
25  12075 612614.1 4683772 9.799130e+07
26  12106 606331.5 4693118 4.081914e+07
27  12115 615361.6 4702384 3.238215e+06
28  12123 614210.3 4708912 9.383202e+04
29  12123 614210.3 4708912 6.075477e+06
30  12123 614210.3 4708912 6.739686e+03
31  12125 631088.1 4703923 3.758122e+07
32  12130 610476.0 4700356 2.607542e+06
33  12136 618643.1 4698809 5.321862e+07
34  12156 603612.7 4704504 1.373999e+07
35  12156 603612.7 4704504 3.371689e+04
36  12156 603612.7 4704504 1.784716e+04
37  12156 603612.7 4704504 1.493681e+05
38  12156 600920.7 4704250 7.195805e+03
39  12165 623467.2 4685155 8.364310e+06
40  12168 633097.9 4713609 2.418246e+06
41  12173 602210.1 4692849 3.943830e+07
42  12184 610816.1 4697644 1.067326e+08
43  12502 610929.0 4659595 7.862394e+07
44  12503 617592.7 4654358 7.326900e+07
45  12513 606790.9 4673634 9.045891e+06
46  12516 619101.7 4662348 4.084114e+07
47  12517 622938.9 4664008 2.745140e+07
48  12521 611453.2 4669033 8.611940e+07
49  12523 602331.7 4660411 5.620575e+07

here is my imperfect code that crashes my computer

n <- 1:nrow(data)

for (i in seq(along = n)) {
for (j in seq(along = n)){

while (data[i,]$POSTAL == data[j,]$POSTAL) {

if (data[i,]$area < data[j,]$area)  {

(temp2[i,]$x <- temp2[j,]$x ) & ( temp2[i,]$y <- temp2[j,]$y)}}}

+3

loops r

Mouad_S 03 June 15 at 18:55

source to share

2 answers

I think you are trying to set all x and y values to a given POSTAL value for values where the area is largest. You can accomplish this in R base with split-apply-comb:

do.call(rbind, lapply(split(data, data$POSTAL), function(x) {
  x$x <- x$x[which.max(x$area)]
  x$y <- x$y[which.max(x$area)]
  x
}))

+3

josliber 03 June 15 at 19:16

source to share

Frank · Accepted Answer · 2015-06-03T19:20:17+0000

My guess is that OP's search is the same as @ josilber's. Here's a non-basic R-way:

library(data.table)
setDT(data)[, c("x","y") := {ii = which.max(area) ; list(x[ii], y[ii])}, by = POSTAL]

(In the above example, this is only one change on line 39.)

Loops in R, aggregating data based on different variables

More articles: