R sort string by number inside

I have a list of filenames to be opened. the format is as follows.

'xxxxx_xxxxxx 00.02.xls'

      

the first 00 refers to the year, the second 02 refers to the month.

Do I need to sort this list with year index first than month index.

+3


source to share


2 answers


If the number of characters can be changed in the filename, the regex can find the year and month for you. I like str_match

the package stringr

.

library(stringr)
extract <- str_match(vec, "([0-9]{2})\\.([0-9]{2}).xls")
vec[order(rank(extract[,2]))]

      

Thus, if you decide you want to order one day a month, you can change the last line from 2

to 3

.

If you want the years to descend, add rev

to it. Like this,vec[rev(order(rank(extract[,2])))]



The great thing about it str_match

is that it tells you what it matches and creates columns for the tokens that you put in brackets. Then you can multiply these columns like any other data frame.

extract
     [,1]        [,2] [,3]
[1,] "07.02.xls" "07" "02"
[2,] "15.12.xls" "15" "12"
[3,] "01.02.xls" "01" "02"

      

Example

vec <- c("xxxxxxxx_xxxxxx 07.02.xls", "xxxxx_xxx 15.12.xls", "xxxxx_xxxxxx 01.02.xls")
extract <- str_match(vec, "([0-9]{2})\\.([0-9]{2}).xls")
vec[order(rank(extract[,2]))]
[1] "xxxxx_xxxxxx 01.02.xls"    "xxxxxxxx_xxxxxx 07.02.xls" "xxxxx_xxx 15.12.xls" 

#or reversed

vec[rev(order(rank(extract[,2])))]
[1] "xxxxx_xxx 15.12.xls"       "xxxxxxxx_xxxxxx 07.02.xls" "xxxxx_xxxxxx 01.02.xls" 

      

+2


source


If there should always be 13 characters in front of the bi-digits, you can do this (assuming your vector of filenames is named x

):



x[order(substr(x,14,18))]

      

+3


source







All Articles