How to generate matrices A) each row has a single value of one; B) the rows are summed up to one

This is a two-part problem: the first is to create a square NXN matrix for which only one random element in each row is 1, the other elements must be zero. (i.e. the sum of the elements in each row is 1).

The second is to create a square NXN matrix for which the sum of the elements in each row is 1, but each element follows the distribution, for example. normal distribution.

Related questions include ( Create a matrix with a conditional sum on each row -R ) Matlab seems to do what I want automatically ( Why does this happen with a random matrix so all rows add up to 1? ), But I'm looking for a solution in r ...

Here's what I've tried:

# PART 1

N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,sample(N,1)]<- 1
})

      

(I am still getting zeros)

# PART 2
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,]<- rnorm(N)
})

      

(Scaling required)

+3


source to share


3 answers


Here you can see why the lapply

loop does not always replace. You are trying to iterate over the rows x

and change the matrix, but what you are changing is a copy x

from the global environment.

The simplest fix is ​​to use a loop for

:

for (y in 1:N) {
  x[y,sample(N,1)]<- 1
}

      

apply

should be used for return value instead of side-effect programming functions.

The way to do this is to return the rows and then rbind

them into the matrix. The second example is shown here, as it looks more like apply

:



do.call(rbind, lapply((1:N), function(i) rnorm(N)))

      

However, this is more readable:

matrix(rnorm(N*N), N, N)

      

Now, to scale this to have row sums of 1. You are using the fact that the matrix is ​​column oriented and that the vectors are refactored, which means you can divide the matrix M

by rowSums(M)

. Using more sensible N=5

:

m <- matrix(rnorm(N*N), N, N)
m/rowSums(m)
##           [,1]       [,2]        [,3]        [,4]        [,5]
## [1,] 0.1788692  0.5398464  0.24980924 -0.01282655  0.04430168
## [2,] 0.4176512  0.2564463  0.11553143  0.35432975 -0.14395871
## [3,] 0.3480568  0.7634421 -0.38433940  0.34175983 -0.06891932
## [4,] 1.1807180 -0.0192272  0.16500179 -0.31201400 -0.01447859
## [5,] 1.1601173 -0.1279919 -0.07447043  0.20865963 -0.16631458 

      

+1


source


Here's another solution without a loop that uses a two column addressing facility using a function "[<-"

. This creates a two-column index matrix, the first column of which is simply an ascending series that assigns the row locations, and the second column (the one responsible for selecting the column positions) is a random integer value. (This is a vectorized version of Matthew's "simple method", and I suspect it will be faster because there is only one call sample

.):

M <- matrix(0,N,N)
M[ cbind(1:N, sample(1:N, N, rep=TRUE))] <- 1

> rowSums(M)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

      



If you didn't specify rep=TRUE

, then colSums (M) would have been the same too, but that was not what you asked. This means that the rank of your resulting matrix may be less than N. If you leave rep=TRUE

, then the matrix will have full rank.

+2


source


No looping solution :)

n <- 5
# on which column in each row insert 1s
s <- sample(n,n,TRUE)
# indexes for each row
w <- seq(1,n*n,by=n)-1
index <- s+w
# vector of 0s
vec <- integer(n*n)
# put 1s
vec[index] <- 1
# voila :)
matrix(vec,n,byrow = T)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    0    0    1    0
[3,]    0    0    0    0    1
[4,]    1    0    0    0    0
[5,]    1    0    0    0    0

      

0


source







All Articles