Randomly assign without replacement using numbers
I have a set of 100 rows of data, and I have a string of four values ( A
, B
, C
, D
). I want to randomly assign strings. However, I want to assign A
30 lines, B
up to 20 lines, C
up to 10 lines, and D
up to 40 lines. How can i do this?
df <- data.frame(ID=c(1:100))
values <- c("A", "B", "C", "D")
One way I thought would be to create an ordered list of numbers numbered 1-100 and assign the first 10 A
, etc., but I think there would be a much better way to do this than that.
source to share
Here are two options. The first one probabilistically assigns values ββto a column in df
. This does not guarantee that there will be exactly 30, 20, 10 and 40 each of A, B, C, D respectively. Rather, it will be pending.
df$values <- sample(values, nrow(df), FALSE, prob = c(.3,.2,.1.,.4))
This second option is probably what you want. It randomly displays the rows from the dataframe (essentially shuffling the rows) and uses them as extraction indices (internally []
), and then assigns to that shuffled set of rows a vector of A, B, C, D values ββcreated using rep
to ensure exactly 30, 20 , 10 and 40 occurrences of each value, respectively.
df$values[sample(1:nrow(df), nrow(df), FALSE)] <- rep(values, c(30,20,10,40))
source to share