Splitting a string into substrings of different lengths in R

I've read similar threads, but my substrings are of different lengths (9,3,5 characters each) and couldn't find an answer for that.

I need to split a 17 characters long string into three substrings, where the first is 9, the next is 3, the last is 5 characters.

Example:

 N12345671004UN005
 N34567892902UN002 

      

I would like to split the rows into three columns:

First col 9 char.length

"N12345671"      
"N34567892"

      

Second col 3 char.length

"004"          
"902"

      

Third col 5 char.length

"UN005"  
"UN002"

      

+3


source to share


2 answers


instr = c("N12345671004UN005", "N34567892902UN002")
out1 = substr(instr, 1, 9)
out2 = substr(instr, 10, 12)
out3 = substr(instr, 13, 17)

      



+4


source


You can try read.fwf

and specifywidths

ff <- tempfile()
cat(file=ff, instr, sep='\n')
read.fwf(ff, widths=c(9,3,5), colClasses=rep('character', 3))
#        V1  V2    V3
#1 N12345671 004 UN005
#2 N34567892 902 UN002

      

Or using tidyr/dplyr

library(dplyr)
library(tidyr)
as.data.frame(instr) %>%
       extract(instr, into=paste0('V', 1:3), '(.{9})(.{3})(.{5})')
#         V1  V2    V3
#1 N12345671 004 UN005
#2 N34567892 902 UN002

      



Or a combination sub

andread.table

read.table(text=sub('(.{9})(.{3})(.{5})', '\\1 \\2 \\3', instr),
              colClasses=rep('character', 3))
#         V1  V2    V3
#1 N12345671 004 UN005 
#2 N34567892 902 UN002

      

data

instr = c("N12345671004UN005", "N34567892902UN002")

      

+5


source







All Articles