Randomly assign without replacement using numbers

I have a set of 100 rows of data, and I have a string of four values ( A

, B

, C

, D

). I want to randomly assign strings. However, I want to assign A

30 lines, B

up to 20 lines, C

up to 10 lines, and D

up to 40 lines. How can i do this?

df <- data.frame(ID=c(1:100))
values <- c("A", "B", "C", "D")

      

One way I thought would be to create an ordered list of numbers numbered 1-100 and assign the first 10 A

, etc., but I think there would be a much better way to do this than that.

+3


source to share


1 answer


Here are two options. The first one probabilistically assigns values ​​to a column in df

. This does not guarantee that there will be exactly 30, 20, 10 and 40 each of A, B, C, D respectively. Rather, it will be pending.

df$values <- sample(values, nrow(df), FALSE, prob = c(.3,.2,.1.,.4))

      



This second option is probably what you want. It randomly displays the rows from the dataframe (essentially shuffling the rows) and uses them as extraction indices (internally []

), and then assigns to that shuffled set of rows a vector of A, B, C, D values ​​created using rep

to ensure exactly 30, 20 , 10 and 40 occurrences of each value, respectively.

df$values[sample(1:nrow(df), nrow(df), FALSE)] <- rep(values, c(30,20,10,40))

      

+9


source







All Articles