Working with excel file in R

I still suffer every time I go through an excel file in R.

What's the best way to do the following?

1- Import excel into R as "whole book" and be able to do analysis on any sheet in the book? if you are thinking of using it XLConnect

, please be aware of the out of memory problem with Java. I have over 30MB of files and the Java memory issue is consuming more time each time. (startup -Xmx

doesn't work for me).

2 - Don't miss any material from any Excel sheet? saving the file to csv

says that some sheets are "out of range", which is 65,536 rows and 256 columns. Also, he cannot deal with some formulas.

3- Don't need to import each sheet separately? Importing sheets into SPSS, STATA or Eviews and saving them to their extension, then working with the output file in R works great in most cases. However, this method has two main problems; one is that you have to download the software on your computer and the other is that you only import one sheet at a time. If I have more than 30 sheets, it will take a long time.

This may be a persistent question that has been answered many times, but each answer solved part of the problem, not the whole problem. It looks like fire is not a strategic solution to the problem.

I'm on Mac OS 10.10 with R 3.1.1

+3


source to share


2 answers


I tried several packages to open excel openxlsx by far the best route. It is faster and more stable than others. Function: openxlsx :: read.xlsx. My advice is to use it to read the entire sheet and then play with the data inside R, rather than reading part of the sheet multiple times. I've used it to open large excel files (8000 col plus) for 1000 lines plus and it always worked well. I am using xlsx package to write to excel but it had a lot of memory issues (so I switched to openxlsx)



-Add in On the side notes, if you want to use R with excel, you sometimes need to execute VBA code from R. I found the procedure to be quite complex. I have fully documented the correct way to do this in a previous question on the stack: Apply VBA from R

+4


source


Consider using a package xlsx

. It has techniques for working with excel files and worksheets. Your question is quite broad, but I think this might be an example:

library(xlsx)
wb <- loadWorkbook('r_test.xlsx')
sheets <- getSheets(wb)
sheet <- sheets[[1]]
df <- readColumns(sheet, 
                  startColumn =  1, endColumn =  3, 
                  startRow = 1, endRow = 6)
df
##  id name x_value
##1  1    A      10
##2  2    B      15
##3  3    C      20
##4  4    D      13
##5  5    E      17

      

As for the memory issue, I think you should check the package ff

:

The package ff

provides data structures that are stored on disk but behave (almost) as if they were in RAM, transparently displaying only the section (page size) in main memory.




Another option (but maybe overkill) would be to load the data into a real database and access the database connections. If you are dealing with really large datasets, a database is best.

Some parameters:

Remember, if you're working with R and a database, delegate that much load to the database (like filtering data, aggregation, etc.) and use R for the final results.

0


source







All Articles