R-frequency based on presence / absence samples

Question

R-frequency based on presence / absence samples

I was not sure how to search for the topic that interests me, so I apologize in advance if this question has already been asked. Frequency table related questions did not help me solve.

I have the following df where it 1

indicates positive and 2

negative results :

d1 <- data.frame( Household = c(1:5), State = c("AL","AL","AL","MI","MI"), Electricity = c(1,1,1,2,2),
Fuelwood = c(2,2,1,1,1))

I want to create a frequency table where I can determine the percentage of people using Eletricity, Fuelwood and Electricity + Fuelwood, for example df2

:

d2 <- data.frame (State = c("AL", "MI"), Electricity = c(66.6,0), Fuelwood = c(0,100), ElectricityANDFuelwood = c(33.3,0))

Please consider that my real df has ok. 42 thousand households, 5 energy sources and 27 states.

+3

r

Gil33 08 dec. 14 at 13:29

source to share

1 answer

akrun · Accepted Answer · 2014-12-08T13:47:14+0000

We can search for strings in d1

, where Electricity

and Fuelwood

are positive ( 1

). Using this boolean index, we can change the values in strings Electricity

and Fuelwood

that are both positive and negative or 2

. Then create an additional column ElecticityANDFuelwood

using the created one index

. From form wide

in long

, using a melt

subset of only two columns State

and variable

, use table

and prop.table

to calculate frequency and relative frequency.

indx <- with(d1, Electricity==1 & Fuelwood==1)

d1[indx,3:4] <- 2
dT <- transform(d1, ElectricityANDFuelwood= (indx)+0)[-1]

library(reshape2)
dT1 <- subset(melt(dT, id.var='State'), value==1, select=1:2)
round(100*prop.table(table(dT1), margin=1),2)
 #      variable
#State Electricity Fuelwood ElectricityANDFuelwood
#  AL       66.67     0.00                  33.33
#  MI        0.00   100.00                   0.00

Or a data.table

solution contributed by @David Arenburg

library(data.table)
d2 <- as.data.table(d1[-1])[, ElectricityANDFuelwood := 
             (Electricity == 1 & Fuelwood == 1)]
d2[(ElectricityANDFuelwood), (2:3) := 2]
d2[, lapply(.SD, function(x) 100*sum(x == 1)/.N), by = State]  
#   State Electricity Fuelwood ElectricityANDFuelwood
#1:    AL    66.66667        0               33.33333
#2:    MI     0.00000      100                0.00000

R-frequency based on presence / absence samples

More articles: