# R: population based on ratio or number

I am trying to combine some data that is both numeric and variable of factors. If the variable is numeric, I would like to have an average. If this is a factor, I would like to get the most frequent value. I wrote the following function, but I don't get the output I would like:

``````meanOrMostFreq <- function(x){
if(class(x) == 'factor'){
tbl <- as.data.frame(table(x))
tbl\$Var1 <- as.character(tbl\$Var1)
return(tbl[tbl\$Freq == max(tbl\$Freq),'Var1'])
}
if(class(x) == 'numeric'){
meanX <- mean(x, na.rm = TRUE)
return(meanX)
}
}
```

```

This is how I use it:

``````df1 <- iris[1:148,]
df1\$letter1 <- as.factor(rep(letters[1:4], 37))

momf <- aggregate(.~ Species, df1, FUN = function(x) meanOrMostFreq(x))
```

```

And the results:

``````> momf
Species Sepal.Length Sepal.Width Petal.Length Petal.Width letter1
1     setosa     5.006000    3.428000     1.462000       0.246    2.46
2 versicolor     5.936000    2.770000     4.260000       1.326    2.54
3  virginica     6.610417    2.964583     5.564583       2.025    2.50
```

```

I am hoping to get the actual letter in the last column instead of a number. Any suggestions on what I am doing wrong?

+3

source to share

Here's a way to use `data.table`

``````library(data.table)
setDT(df1)[ ,lapply(.SD, function(x) if(is.numeric(x)) mean(x, na.rm=TRUE) else
names(which.max(table(x)))) , by=Species]

#         Species Sepal.Length Sepal.Width Petal.Length Petal.Width letter1
#1:     setosa     5.006000    3.428000     1.462000       0.246       a
#2: versicolor     5.936000    2.770000     4.260000       1.326       c
#3:  virginica     6.610417    2.964583     5.564583       2.025       a
```

```
+5

source

Going through the interface of a formula to `aggregate`

, apparently, loses the metadata, which is its "factor"; this worked for me:

``````> meanOrMostFreq
function(x){
if(class(x) == 'factor'){
return(  names(which.max(table(x))) )
}
if(class(x) == 'numeric'){
meanX <- mean(x, na.rm = TRUE)
return(meanX)
}
}
> aggregate(df1[-5], df1, meanOrMostFreq)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width letter1
1     setosa     5.006000    3.428000     1.462000       0.246       a
2 versicolor     5.936000    2.770000     4.260000       1.326       c
3  virginica     6.610417    2.964583     5.564583       2.025       a
```

```

Since there are different behaviors for `aggregate.formula`

and `aggregate.data.frame`

, this seems like a mistake to me.

+1

source

Alternative using package `plyr`

:

``````ddply(df1, .(Species), function(df) {
sapply(df, meanOrMostFreq)
})
```

```

[] 's

+1

source

All Articles