Partition matrix into N equal sizes with R
How can I split a matrix or framework into N equal sizes using R? I want to slice a matrix or data frame horizontally.
For example given:
r = 8
c = 10
number_of_chunks = 4
data = matrix(seq(r*c), nrow = r, ncol=c)
>>> data
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 9 17 25 33 41 49 57 65 73
[2,] 2 10 18 26 34 42 50 58 66 74
[3,] 3 11 19 27 35 43 51 59 67 75
[4,] 4 12 20 28 36 44 52 60 68 76
[5,] 5 13 21 29 37 45 53 61 69 77
[6,] 6 14 22 30 38 46 54 62 70 78
[7,] 7 15 23 31 39 47 55 63 71 79
[8,] 8 16 24 32 40 48 56 64 72 80
I would like to shorten data
into a list of 4 elements:
Item 1:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 9 17 25 33 41 49 57 65 73
[2,] 2 10 18 26 34 42 50 58 66 74
Element 2:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[3,] 3 11 19 27 35 43 51 59 67 75
[4,] 4 12 20 28 36 44 52 60 68 76
Element 3:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[5,] 5 13 21 29 37 45 53 61 69 77
[6,] 6 14 22 30 38 46 54 62 70 78
Element 4:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[7,] 7 15 23 31 39 47 55 63 71 79
[8,] 8 16 24 32 40 48 56 64 72 80
With numpy in python, I can use numpy.array_split
.
source to share
Here's an attempt at base R. Compute "pretty" cutout values ββfor a sequence of lines with pretty
. Categorize the sequence of line numbers with cut
and return a split sequence list by cut values ββwith split
. Finally, view the list of splitted string values ββusing lapply
extract matrix subsets using [
.
lapply(split(seq_len(nrow(data)),
cut(seq_len(nrow(data)), pretty(seq_len(nrow(data)), number_of_chunks))),
function(x) data[x, ])
$`(0,2]`
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 9 17 25 33 41 49 57 65 73
[2,] 2 10 18 26 34 42 50 58 66 74
$`(2,4]`
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 11 19 27 35 43 51 59 67 75
[2,] 4 12 20 28 36 44 52 60 68 76
$`(4,6]`
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 5 13 21 29 37 45 53 61 69 77
[2,] 6 14 22 30 38 46 54 62 70 78
$`(6,8]`
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 7 15 23 31 39 47 55 63 71 79
[2,] 8 16 24 32 40 48 56 64 72 80
Roll this up into a function:
array_split <- function(data, number_of_chunks) {
rowIdx <- seq_len(nrow(data))
lapply(split(rowIdx, cut(rowIdx, pretty(rowIdx, number_of_chunks))), function(x) data[x, ])
}
Then you can use
array_split(data=data, number_of_chunks=number_of_chunks)
to return the same result as above.
Nice simplification suggested by @ user20650,
split.data.frame(data,
cut(seq_len(nrow(data)), pretty(seq_len(nrow(data)), number_of_chunks)))
Surprisingly, it split.data.frame
returns a list of matrices when its first argument is a matrix.
source to share
number_of_chunks = 4
lapply(seq(1, NROW(data), ceiling(NROW(data)/number_of_chunks)),
function(i) data[i:min(i + ceiling(NROW(data)/number_of_chunks) - 1, NROW(data)),])
OR
lapply(split(data, rep(1:number_of_chunks, each = NROW(data)/number_of_chunks)),
function(a) matrix(a, ncol = NCOL(data)))
source to share
Try not to split the data explicitly because it's a different copy. You would split the indexes that you want to access.
With this function you can divide by number of chunks (for parallelism) or by chunk size.
CutBySize <- function(m, block.size, nb = ceiling(m / block.size)) {
int <- m / nb
upper <- round(1:nb * int)
lower <- c(1, upper[-nb] + 1)
size <- c(upper[1], diff(upper))
cbind(lower, upper, size)
}
CutBySize(nrow(data), nb = number_of_chunks)
lower upper size
[1,] 1 2 2
[2,] 3 4 2
[3,] 5 6 2
[4,] 7 8 2
source to share