Loops in R, aggregating data based on different variables

I have a data frame with 2332 lines, I want to find the lines where the variable "MAIL" are equal, then assign them all the values ​​in the line where the variable "area" is the largest.

here are the first 50 lines

> data[1:50,]
   POSTAL        x       y         area
0   12920 573385.9 4972933 8.384062e+06
1   12921 623487.7 4971908 8.233541e+07
2   12923 583786.9 4978081 1.474410e+08
3   12924 613452.4 4927788 1.497106e+07
4   12934 588962.9 4965368 2.194386e+08
5   12935 596550.0 4967100 1.888997e+08
6   12944 618378.6 4921592 2.534854e+07
7   12952 583074.3 4953381 2.943473e+07
8   12955 582523.7 4959810 5.204965e+07
9   12958 611949.9 4979674 9.186815e+07
10  12959 601546.4 4979545 1.037816e+08
11  12962 611088.7 4951280 1.079834e+08
12  12972 612442.2 4934335 2.356099e+08
13  12978 595047.1 4941416 9.280316e+06
14  12979 628230.8 4983172 1.076677e+07
15  12981 591559.5 4944906 3.203060e+08
16  12985 599050.4 4935220 1.643595e+08
17  12992 616585.6 4963995 1.989913e+08
18  12997 592669.1 4914134 2.731502e+07
19  12017 627445.1 4686235 4.773138e+07
20  12024 619994.9 4704246 7.021505e+06
21  12029 629805.8 4696477 5.399608e+07
22  12037 618566.6 4688290 9.184531e+07
23  12060 624089.4 4697165 8.745604e+07
24  12062 622755.7 4709897 8.574364e+06
25  12075 612614.1 4683772 9.799130e+07
26  12106 606331.5 4693118 4.081914e+07
27  12115 615361.6 4702384 3.238215e+06
28  12123 614210.3 4708912 9.383202e+04
29  12123 614210.3 4708912 6.075477e+06
30  12123 614210.3 4708912 6.739686e+03
31  12125 631088.1 4703923 3.758122e+07
32  12130 610476.0 4700356 2.607542e+06
33  12136 618643.1 4698809 5.321862e+07
34  12156 603612.7 4704504 1.373999e+07
35  12156 603612.7 4704504 3.371689e+04
36  12156 603612.7 4704504 1.784716e+04
37  12156 603612.7 4704504 1.493681e+05
38  12156 600920.7 4704250 7.195805e+03
39  12165 623467.2 4685155 8.364310e+06
40  12168 633097.9 4713609 2.418246e+06
41  12173 602210.1 4692849 3.943830e+07
42  12184 610816.1 4697644 1.067326e+08
43  12502 610929.0 4659595 7.862394e+07
44  12503 617592.7 4654358 7.326900e+07
45  12513 606790.9 4673634 9.045891e+06
46  12516 619101.7 4662348 4.084114e+07
47  12517 622938.9 4664008 2.745140e+07
48  12521 611453.2 4669033 8.611940e+07
49  12523 602331.7 4660411 5.620575e+07 

      

here is my imperfect code that crashes my computer

n <- 1:nrow(data)

for (i in seq(along = n)) {
for (j in seq(along = n)){

while (data[i,]$POSTAL == data[j,]$POSTAL) {

if (data[i,]$area < data[j,]$area)  {

(temp2[i,]$x <- temp2[j,]$x ) & ( temp2[i,]$y <- temp2[j,]$y)}}}

      

+3


source to share


2 answers


My guess is that OP's search is the same as @ josilber's. Here's a non-basic R-way:

library(data.table)
setDT(data)[, c("x","y") := {ii = which.max(area) ; list(x[ii], y[ii])}, by = POSTAL]

      



(In the above example, this is only one change on line 39.)

+5


source


I think you are trying to set all x and y values ​​to a given POSTAL value for values ​​where the area is largest. You can accomplish this in R base with split-apply-comb:



do.call(rbind, lapply(split(data, data$POSTAL), function(x) {
  x$x <- x$x[which.max(x$area)]
  x$y <- x$y[which.max(x$area)]
  x
}))

      

+3


source







All Articles