Read selected files from directory based on selection criteria in R

I would like to read only selected .txt files in a folder in order to build a giant table ... I have over 9K files and I would like to import files with the selected distance and building type as indicated by part of the file name.

For example, I want to first select files with a name containing "_U0" and "_0_Final.txt":

Type = c(0,1)
D3Test = 1
Distance = c(0,50,150,300,650,800)
D2Test = 1;

files <- list.files(path=data.folder, pattern=paste("*U", Type[D3Test],"*_",Distance[D2Test],"_Final.txt",sep=""))

      

But the result is returning empty ... Any problem with my construction?

 filename <- scan(what="")
 "M10_F1_T1_D1_U0_H1_0_Final.txt"   "M10_F1_T1_D1_U0_H1_150_Final.txt" "M10_F1_T1_D1_U0_H1_300_Final.txt"
 "M10_F1_T1_D1_U0_H1_50_Final.txt"  "M10_F1_T1_D1_U0_H1_650_Final.txt" "M10_F1_T1_D1_U0_H1_800_Final.txt"
 "M10_F1_T1_D1_U0_H2_0_Final.txt"   "M10_F1_T1_D1_U0_H2_150_Final.txt" "M10_F1_T1_D1_U0_H2_300_Final.txt"
 "M10_F1_T1_D1_U0_H2_50_Final.txt"  "M10_F1_T1_D1_U0_H2_650_Final.txt" "M10_F1_T1_D1_U0_H2_800_Final.txt"
 "M10_F1_T1_D1_U0_H3_0_Final.txt"   "M10_F1_T1_D1_U0_H3_150_Final.txt" "M10_F1_T1_D1_U0_H3_300_Final.txt"
 "M10_F1_T1_D1_U0_H3_50_Final.txt"  "M10_F1_T1_D1_U0_H3_650_Final.txt" "M10_F1_T1_D1_U0_H3_800_Final.txt"
 "M10_F1_T1_D1_U1_H1_0_Final.txt"   "M10_F1_T1_D1_U1_H1_150_Final.txt" "M10_F1_T1_D1_U1_H1_300_Final.txt"
 "M10_F1_T1_D1_U1_H1_50_Final.txt"  "M10_F1_T1_D1_U1_H1_650_Final.txt" "M10_F1_T1_D1_U1_H1_800_Final.txt"

      

+3


source to share


3 answers


Another way would be to use sprintf

and grepl

.



x <- c("M10_F1_T1_D1_U0_H1_150_Final.txt", "M10_F1_T1_D1_U0_H2_650_Final.txt", "M10_F1_T1_D1_U1_H1_650_Final.txt")

x[grepl(sprintf("U%i_H%i_%i", 1, 1, 650), x)]

[1] "M10_F1_T1_D1_U1_H1_650_Final.txt"

      

+2


source


You should look at the result of going to pattern

:

"*U0*_0_Final.txt"

      

It does not collect any of these filenames. An asterisk indicates zero or more instances of "0" between "U" and underscore. If the type and distance are not indicated with T and D in the filenames, this gives the correct pattern:



grep( pattern=paste0("_U", Type[D3Test],".*_", Distance[D2Test],"_Final\\.txt"), filename)
#-----------
#[1]  1  7 13   So matches 3 filenames

      

Note that you need to avoid (with two backslashes) periods that you only want to be periods, because periods are special characters. You also need to use ". *" To allow space in the pattern.

+1


source


files <- list.files(path=data.folder, pattern=paste("*U", Type[D3Test], "....",Distance[D2Test], sep=""))

      

I have revised my code and it works! Basically the idea is to use a dot to represent each character between Type [D3Test] and Distance [D2Test], since the characters between the two are fixed at 4.

Thanks to: http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

0


source







All Articles