How do I generate a vector that satisfies some conditions?
all!
how to create a vector that satisfies some conditions?
Objective: To create a vector a
such that length(a)=400000
that consists of eight elements: 0, 5, 10, 50, 500, 5000, 50000, 300000
. Each element appears a certain number of times, namely 290205, 100000, 8000, 1600, 160, 32, 2, 1
, respectively. In addition, it is a
locked in 4000 "groups" of 100 consecutive elements; name them a_k, k=1,...,4000
. These groups must satisfy the following:
- The sum of each group exceeds 150, i.e.
sum_i a_k_i>150
for everyonek
. - Elements
5
,10
and50
appear 25 to 29 times in each group, that is, for all thek
set{i|a_i_k in (5,10,50)}
has a value from 25 to 29. -
0
never appears more than 8 times in a row in any group.
I've tried this many times, but it doesn't seem to work: My current code looks like this:
T <- 4*10^(5) # data size
x <- c(0, 5, 10, 50, 500, 5000, 50000, 300000) #seed vector
t <- c(290205, 100000, 8000, 1600, 160, 32, 2, 1) #frequency
A <- matrix(0, 4000, 100) #4000 groups
k <- rep(0, times = 8) #record the number of seeds
for(m in 1:4000) {
p <- (t - k)/(T - 100*(m - 1)) #seed probability
A[, m] <- sample(x, 100, replace = TRUE, prob = p) #group m
sm <- 0
i <- 0
for(j in 1:92) {
if(sum(A[m,j:j + 8])==0){
if(A[m,j] > 0 & A[m,j] < 500) {i <- i+1}
sm <- sm+A[100*m+j]
}
else j <- 0
}
if (sm >= 150 & i > 24 & i < 30 & j != 0) {
m <- m + 1
for (n in seq_len(x)) {
k[n] <- sum(A[, m+1] == x[n]) + k[n]
}
}
}
I can get it started and maybe someone can help move on to the next step. My approach is to start with constraints and sample
define numbers.
set.seed(77)
choose <- c(0,5,10,50,500,5000,50000,300000)
freqs <- c(290205,100000,8000,1600,160,32,2,1)
probs <- freqs/sum(freqs)
check.sum <- function(vec) sum(vec) >= 150
check.interval <- function(vec) abs(sum(vec %in% c(5,10,50))-27)<=2
check.runs <- function(vec, runmax=8) max(rle(vec)$lengths[rle(vec)$values==0]) <= runmax
check.all <- function(vector) {
logicals <- c(check.sum(vector),
check.runs(vector),
check.runs(vector)
)
return(all(logicals))
}
nums <- NULL
res <- list()
for(i in 1:4000) {
nums <- numeric(100)
while(!check.all(nums)) {nums <- sample(choose, 100, replace=T,prob=probs)}
res[i] <- list(nums)
}
str(res)
List of 4000
$ : num [1:100] 1e+01
So this gives you a list of 4000 groups of 100 numbers that fit the limits. It only took two seconds of system time.
The next step is for someone to get a way to create something like this other than eliminating 300,000 after using it, and 50,000 when using it twice, etc.
How easy is it to build? For example:
amat<-matrix(rep(c(rep(rep(c(0,5),c(8,3)),8),
rep(c(0,NA),c(8,4))),4000),nrow=100)
amat[97:100,1:2205]<-c(rep(10,3),0)
amat[97:98,2206:4000]<-c(5,5)
amat[99:100,2206:2897]<-c(10,10)
amat[99:100,2898]<-c(5,50)
amat[99:100,2899:3307]<-c(5,50)
amat[99:100,3308:3902]<-c(50,50)
amat[which(is.na(amat))]<-rep(c(10,500,5000,5e4,3e5),c(1,160,32,2,1))
a<-c(amat)
This satisfies all of your conditions:
The item is counted:
>sapply(c(0,5,10,50,500,5000,50000,300000),function(x)length(which(a==x)))
[1] 290205 100000 8000 1600 160 32 2 1
Group sums:
> table(colSums(amat)>=150)
TRUE
4000
Frequency 5,10,50
:
> table(sapply(1:4000,function(x)abs(sum(amat[,x] %in% c(5,10,50))-27)<=2))
TRUE
4000
Runs 0
:
> table(sapply(1:4000,function(x)max(rle(amat[,x])$lengths[rle(amat[,x])$values==0])<=8))
#If this is slow, we can just use max(rle(amax[,x]))<=8
# because there aren't many valid groups with strings of 9+
# non-0 elements
TRUE
4000
if in fact we are never allowed to have strings of 9 0
s, we need to make a slight adjustment for groups 2: 2206 because, for example,a[100:108]==0
Inspired by @plafort's approach, I came up with the following, which seems to be very fast and should be able to generate all vectors that satisfy your conditions:
elts<-c(0,5,10,50,500,5000,50000,300000)
freq<-c(290205,100000,8000,1600,160,32,2,1)
ngrp<-4000L
grp.cond1<-function(x)sum(x)>=150
grp.cond2<-function(x)abs(sum(x %in% c(5,10,50))-27)<=2
grp.cond3<-function(x)max(rle(x)$lengths[rle(x)$values==0])<=8
check.all<-function(mat){
all(sapply(1:ncol(mat),function(y)grp.cond1(mat[,y])),
sapply(1:ncol(mat),function(y)grp.cond2(mat[,y])),
sapply(1:ncol(mat),function(y)grp.cond3(mat[,y])))}
while(!check.all(amat)){amat<-matrix(sample(rep(elts,freq)),ncol=ngrp)}
a<-c(amat)
I also wrote the code in such a way that it is easy to generalize to other items / group counts, group numbers, and group conditions.
Unfortunately, these conditions seem to be rather harsh, and it a
can take a long time to create an acceptable one . I skipped the loop while
1300 times with no success ...
Thanks everyone! I figured out my problem.
rm(list = ls())
media <- matrix(rep(rep(c(0,5,NA),c(72,25,3)),4000),nrow=100)
media[98:100,1:2400] <-c(10,10,10)
media[98:99,2401:3200] <-c(50,10)
media[98:99,3201:4000] <-c(50,0)
media[100,2401:4000] <-rep(c(0,500,5000,50000,300000),c(1405,160,32,2,1))
obj1 <- matrix(0,100L,4000)
obj2 <-obj1
grp.cond<-function(x) max(rle(x)$lengths[rle(x)$values==0])<=8
elts<-c(0,5,10,50,500,5000,50000,300000)
for(i in 1:4000){
freq<-c(sapply(elts, function(x) length(which(media[,i]==x))))
while(!grp.cond(obj1[,i])){obj1[,i]<-c(sample(rep(elts,freq)))}
i<-i+1
}
elts1<-c(1:4000)
freq1<-rep(1,times=4000)
a1<-sample(rep(elts1,freq1))
for(i in 1:4000){obj2[,i]<-obj1[,a1[i]]}
a <- c(obj2)