Combining irrelevant / similar observations into one (s)
After doing a survey on perceived issues in each district, I get this framework . Since the survey had different choices from + open, the results for open-ended questions are often irrelevant (see below):
library(dplyr)
library(splitstackshape)
df = read.csv("http://pastebin.com/raw.php?i=tQKHWMvL")
# Splitting multiple answers into different rows.
df = cSplit(df, "Problems", ",", direction = "long")
df = df %>%
group_by(Problems) %>%
summarise(Total = n()) %>%
mutate(freq = Total/sum(Total)*100) %>%
arrange(rank = desc(rank(freq)))
Result in this dataframe:
> df
Source: local data table [34 x 3]
Problems Total freq
1 Hurtos o robos sin violencia 245 25.6008359
2 Drogas 232 24.2424242
3 Peleas callejeras 162 16.9278997
4 Ningún problema 149 15.5694880
5 Agresiones 66 6.8965517
6 Robos con violencia 62 6.4785789
7 Quema contenedores 6 0.6269592
8 Ruidos 5 0.5224660
9 NS/NC 4 0.4179728
10 Desempleo 2 0.2089864
.. ... ... ...
>
As you can see the results after line 9 is mostly irrelevant (only one or two respondents per parameter), so I would like them to be grouped into one parameter (like "others") without losing the relationship to the neighborhood (so I can't rename the values now). Any suggestions?
source to share
splitstackshape
imports the package data.table
(so you don't even need it library
) and assigns a class to data.table
your dataset, so I'll just continue the syntax data.table
there, especially since nothing beats data.table
when it comes to subset assignments.
In other words, instead of this long pipeline, dplyr
you can simply do
df[, freq := .N / nrow(df) * 100 , by = Problems]
df[freq < 6, Problems := "OTHER"]
And you're good to go.
You can check the new pivot table using
df[, .(freq = .N/nrow(df) * 100), by = Problems][order(-freq)]
# 1: Hurtos o robos sin violencia 25.600836
# 2: Drogas 24.242424
# 3: Peleas callejeras 16.927900
# 4: Ningֳ÷n problema 15.569488
# 5: Agresiones 6.896552
# 6: Robos con violencia 6.478579
# 7: OTHER 4.284222
source to share