In R, scroll through the directory and store the filename in a column
I'm trying to do something in R that shouldn't be too heavy I guess. I have a folder with many, many files. They all look like this.
airbag.WS-U-E-A.lst
.
delimiter, .lst
extension (read as text).
Each file contains data per line like
/home/nobackup/SONAR/COMPACT/WR-U-E-A/WR-U-E-A0000075.data.ids.xml: <sentence>ja voor den airbag op te pompen eh :p</sentence>
/home/nobackup/SONAR/COMPACT/WR-U-E-A/WR-U-E-A0000129.data.ids.xml: <sentence>Dobby , als ze valt heeft ze dan wel al ne airbag hee</sentence>
What I want to do is, in R, create a new dataset containing data from all files. Ideally it would look like this:
ID | filename | word | component | left-context | right-context
---------------------------------------------------------------------------------------
1 airbag.WS-U-E-A.lst airbag WS-U-E-A ja voor den op te pompen eh :p
2 airbag.WS-U-E-A.lst airbag WS-U-E-A Dobby , als ze valt heeft ze dan wel al ne hee
Generating all of this content is something I should be able to do with some regex in files, however I'm not really sure how to encode all the files. For example, I would get the component and word information from the regex function in the filename, but how do I store the filename for each file in a column?
I tried the following
files <- list.files(path="", pattern="*.lst", full.names=T, recursive=FALSE)
lapply(files, function(x) {
t <- dirname(x)
out <- function(t)
})
t
But the error received was
Error: unexpected '}' in:
"out <- function(t)
}"
source to share
As David Arrenburg posted in the comments (not yet responded to post in reply: D), the solution is to use a function apply
for files.
lapply(files, basename
)
which will output a list()
. For convenience, it would be better to get a vector. In this case, use sapply
.
sapply(files, basename)
source to share