How to efficiently transpose a matrix in R?

I have a large matrix that I would like to transpose without bringing it into memory. I can think of three ways:

  • Write your original matrix to .txt column by column. Later, read it in memory line by line with readLines(...)

    and write those lines sequentially to a new file. The problem with this approach is that I don't know how to add to the .txt file by column and not by row.
  • Read the matrix from the column of the .txt file column by column and then write the columns to the new file line by line. I tried it with scan(pipe("cut -f1 filename.txt"))

    , but this operation opens a separate connection on each iteration and therefore takes too long to complete due to the overhead of opening and closing these connections.
  • Use an unknown R function to complete the task.

Is there something I am missing here? Do I need to do this with a separate program? Thanks in advance for your help!

+3


source to share


3 answers


There are many better languages ​​out there for this kind of thing. If you really want to use R, you will need to read the file one line at a time, take one element from the desired column, store it in a vector, and then write that vector as a string. And do this for each column.



Columns = 1e9
Rows = 1e6

FileName = "YourFile.csv"
NewFile = "NewFileName"

for(i in 1:Columns)
{
    ColumnToBeRow = vector("numeric", Columns)
    for(j in 1:Rows)
    {
        ColumnToBeRow[j] = read.csv(FileName, nrows=1, skip=(j - 1), header=F)
    }
    write.csv(ColumnToBeRow, NewFile, append=TRUE)
}

      

+3


source


This post on the R-help mailing list includes my naive (psuedo?) Code to split an input file into n transposed output files, then put chunks from the n output files (staggered) to stitch the transposed columns back together. It is efficient to do this in pieces of strings both during the transposition phases and on the stitches. It's worth asking what you were hoping to do after transferring the matrix to create a file that still won't fit in memory. There is also scientific literature on effective out-of-memory matrix transposition ( for example ).



+1


source


scan

can read it as a stream and all you need to add to the mix is ​​the number of lines. Since your original matrix has a dimension attribute, you just need to store the column value and use it as the row value when reading.

 MASS::write.matrix(matrix(1:30, 6), file="test.txt")

 matrix( scan("test.txt"), 5)

#-------------
Read 30 items
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    3    4    5    6
[2,]    7    8    9   10   11   12
[3,]   13   14   15   16   17   18
[4,]   19   20   21   22   23   24
[5,]   25   26   27   28   29   30

      

I suspect that your code for writing matrix strings as strings will not be as fast as Ripley MASS-pkg, but if I am wrong you should suggest an improvement for Prof Ripley.

0


source







All Articles