In R, scroll through the directory and store the filename in a column

I'm trying to do something in R that shouldn't be too heavy I guess. I have a folder with many, many files. They all look like this.

airbag.WS-U-E-A.lst

      

.

delimiter, .lst

extension (read as text).

Each file contains data per line like

/home/nobackup/SONAR/COMPACT/WR-U-E-A/WR-U-E-A0000075.data.ids.xml:  <sentence>ja voor den airbag op te pompen eh :p</sentence>
/home/nobackup/SONAR/COMPACT/WR-U-E-A/WR-U-E-A0000129.data.ids.xml:  <sentence>Dobby , als ze valt heeft ze dan wel al ne airbag hee</sentence>

      

What I want to do is, in R, create a new dataset containing data from all files. Ideally it would look like this:

ID | filename             | word | component | left-context                               | right-context
---------------------------------------------------------------------------------------
1    airbag.WS-U-E-A.lst   airbag   WS-U-E-A    ja voor den                                  op te pompen eh :p
2    airbag.WS-U-E-A.lst   airbag   WS-U-E-A    Dobby , als ze valt heeft ze dan wel al ne   hee

      

Generating all of this content is something I should be able to do with some regex in files, however I'm not really sure how to encode all the files. For example, I would get the component and word information from the regex function in the filename, but how do I store the filename for each file in a column?

I tried the following

files <- list.files(path="", pattern="*.lst", full.names=T, recursive=FALSE)
lapply(files, function(x) {
    t <- dirname(x)
    out <- function(t)
})

t

      

But the error received was

Error: unexpected '}' in:
"out <- function(t)
}"

      

+1


source to share


1 answer


As David Arrenburg posted in the comments (not yet responded to post in reply: D), the solution is to use a function apply

for files.

lapply(files, basename

)



which will output a list()

. For convenience, it would be better to get a vector. In this case, use sapply

.

sapply(files, basename)

      

+1


source







All Articles