How do I generate a vector that satisfies some conditions?

all! how to create a vector that satisfies some conditions?
Objective: To create a vector a

such that length(a)=400000

that consists of eight elements: 0, 5, 10, 50, 500, 5000, 50000, 300000

. Each element appears a certain number of times, namely 290205, 100000, 8000, 1600, 160, 32, 2, 1

, respectively. In addition, it is a

locked in 4000 "groups" of 100 consecutive elements; name them a_k, k=1,...,4000

. These groups must satisfy the following:

  • The sum of each group exceeds 150, i.e. sum_i a_k_i>150

    for everyone k

    .
  • Elements 5

    , 10

    and 50

    appear 25 to 29 times in each group, that is, for all the k

    set {i|a_i_k in (5,10,50)}

    has a value from 25 to 29.
  • 0

    never appears more than 8 times in a row in any group.

I've tried this many times, but it doesn't seem to work: My current code looks like this:

     T <- 4*10^(5)   # data size  
            x <- c(0, 5, 10, 50, 500, 5000, 50000, 300000)      #seed vector  
            t <- c(290205, 100000, 8000, 1600, 160, 32, 2, 1)   #frequency  
            A <- matrix(0, 4000, 100)    #4000 groups  
            k <- rep(0, times = 8)        #record the number of seeds   
            for(m in 1:4000) {        
            p <- (t - k)/(T - 100*(m - 1))      #seed probability  
            A[, m] <- sample(x, 100, replace = TRUE, prob = p)  #group m   
            sm <- 0         
            i <- 0    
              for(j in 1:92) {  
                  if(sum(A[m,j:j + 8])==0){  
                     if(A[m,j] > 0 & A[m,j] < 500) {i <- i+1}  
                        sm <- sm+A[100*m+j]       
                    }  
                   else j <- 0   
                }                
                       if (sm >= 150 & i > 24 & i < 30 & j != 0) {    
                           m <- m + 1  
                           for (n in seq_len(x)) {  
                               k[n] <- sum(A[, m+1] == x[n]) + k[n]  
                            }  
                        }  
            }  

      

+3


source to share


4 answers


I can get it started and maybe someone can help move on to the next step. My approach is to start with constraints and sample

define numbers.

set.seed(77)
choose <- c(0,5,10,50,500,5000,50000,300000)
freqs <- c(290205,100000,8000,1600,160,32,2,1)
probs <- freqs/sum(freqs)
check.sum <- function(vec) sum(vec) >= 150
check.interval <- function(vec) abs(sum(vec %in% c(5,10,50))-27)<=2
check.runs <- function(vec, runmax=8) max(rle(vec)$lengths[rle(vec)$values==0]) <= runmax

check.all <- function(vector) {
  logicals <- c(check.sum(vector), 
                check.runs(vector),
                check.runs(vector)
                )
  return(all(logicals))

}

nums <- NULL
res <- list()
for(i in 1:4000) {
  nums <- numeric(100)
  while(!check.all(nums)) {nums <- sample(choose, 100, replace=T,prob=probs)}

  res[i] <- list(nums)
}

str(res)
List of 4000
 $ : num [1:100] 1e+01

      



So this gives you a list of 4000 groups of 100 numbers that fit the limits. It only took two seconds of system time.

The next step is for someone to get a way to create something like this other than eliminating 300,000 after using it, and 50,000 when using it twice, etc.

+2


source


How easy is it to build? For example:

amat<-matrix(rep(c(rep(rep(c(0,5),c(8,3)),8),
               rep(c(0,NA),c(8,4))),4000),nrow=100)
amat[97:100,1:2205]<-c(rep(10,3),0)
amat[97:98,2206:4000]<-c(5,5)
amat[99:100,2206:2897]<-c(10,10)
amat[99:100,2898]<-c(5,50)
amat[99:100,2899:3307]<-c(5,50)
amat[99:100,3308:3902]<-c(50,50)
amat[which(is.na(amat))]<-rep(c(10,500,5000,5e4,3e5),c(1,160,32,2,1))

a<-c(amat)

      

This satisfies all of your conditions:

The item is counted:

>sapply(c(0,5,10,50,500,5000,50000,300000),function(x)length(which(a==x)))
[1] 290205 100000   8000   1600    160     32      2      1

      

Group sums:



> table(colSums(amat)>=150)

TRUE 
4000 

      

Frequency

5,10,50

:

> table(sapply(1:4000,function(x)abs(sum(amat[,x] %in% c(5,10,50))-27)<=2))

TRUE 
4000 

      

Runs 0

:

> table(sapply(1:4000,function(x)max(rle(amat[,x])$lengths[rle(amat[,x])$values==0])<=8))
#If this is slow, we can just use max(rle(amax[,x]))<=8
#  because there aren't many valid groups with strings of 9+
#  non-0 elements

TRUE 
4000 

      

if in fact we are never allowed to have strings of 9 0

s, we need to make a slight adjustment for groups 2: 2206 because, for example,a[100:108]==0

+2


source


Inspired by @plafort's approach, I came up with the following, which seems to be very fast and should be able to generate all vectors that satisfy your conditions:

elts<-c(0,5,10,50,500,5000,50000,300000)
freq<-c(290205,100000,8000,1600,160,32,2,1)
ngrp<-4000L

grp.cond1<-function(x)sum(x)>=150
grp.cond2<-function(x)abs(sum(x %in% c(5,10,50))-27)<=2
grp.cond3<-function(x)max(rle(x)$lengths[rle(x)$values==0])<=8

check.all<-function(mat){
  all(sapply(1:ncol(mat),function(y)grp.cond1(mat[,y])),
      sapply(1:ncol(mat),function(y)grp.cond2(mat[,y])),
      sapply(1:ncol(mat),function(y)grp.cond3(mat[,y])))}

while(!check.all(amat)){amat<-matrix(sample(rep(elts,freq)),ncol=ngrp)}
a<-c(amat)

      

I also wrote the code in such a way that it is easy to generalize to other items / group counts, group numbers, and group conditions.

Unfortunately, these conditions seem to be rather harsh, and it a

can take a long time to create an acceptable one . I skipped the loop while

1300 times with no success ...

+2


source


Thanks everyone! I figured out my problem.

rm(list = ls())  
media <- matrix(rep(rep(c(0,5,NA),c(72,25,3)),4000),nrow=100)  
media[98:100,1:2400] <-c(10,10,10)  
media[98:99,2401:3200] <-c(50,10)  
media[98:99,3201:4000] <-c(50,0)  
media[100,2401:4000] <-rep(c(0,500,5000,50000,300000),c(1405,160,32,2,1))  
obj1 <- matrix(0,100L,4000)  
obj2 <-obj1  
grp.cond<-function(x) max(rle(x)$lengths[rle(x)$values==0])<=8  
elts<-c(0,5,10,50,500,5000,50000,300000)  
for(i in 1:4000){  
freq<-c(sapply(elts, function(x) length(which(media[,i]==x))))  
while(!grp.cond(obj1[,i])){obj1[,i]<-c(sample(rep(elts,freq)))}  
i<-i+1  
}  
elts1<-c(1:4000)  
freq1<-rep(1,times=4000)  
a1<-sample(rep(elts1,freq1))  
for(i in 1:4000){obj2[,i]<-obj1[,a1[i]]} 
a <- c(obj2)

      

0


source







All Articles